Engaging Image Chat: Modeling Personality in Grounded Dialogue

Kurt Shuster, Samuel Humeau, Antoine Bordes, Jason Weston



To achieve the long-term goal of machines being able to engage humans in conversation, our models should be engaging. We focus on communication grounded in images, whereby a dialogue is conducted based on a given photo, a setup that is naturally engaging to humans (Hu et al., 2014). We collect a large dataset of grounded human-human conversations, where humans are asked to play the role of a given personality, as the use of personality in conversation has also been shown to be engaging (Shuster et al., 2018). Our dataset, Image-Chat, consists of 202k dialogues and 401k utterances over 202k images using 215 possible personality traits. We then design a set of natural architectures using state-of-the-art image and text representations, considering various ways to fuse the components. Automatic metrics and human evaluations show the efficacy of approach, in particular where our best performing model is preferred over human conversationalists 47.7% of the time.

Model Examples

Example predictions from our best TRANSRESNET(MM-Sum) model on the human evaluation set forturn 3.  Two speakers A and B with given personality traits discuss a given photo.  The first two turns are fromhumans, and only the third turn here is completed by our model in these examples.