Seeing the Way: the Role of Vision in Conversation Turn Exchange Perception

in Multisensory Research
During conversations, we engage in turn-taking behaviour that proceeds back and forth effortlessly as we communicate. In any given day, we participate in numerous face-to-face interactions that contain social cues from our partner and we interpret these cues to rapidly identify whether it is appropriate to speak. Although the benefit provided by visual cues has been well established in several areas of communication, the use of visual information to make turn-taking decisions during conversation is unclear. Here we conducted two experiments to investigate the role of visual information in identifying conversational turn exchanges. We presented clips containing single utterances spoken by single individuals engaged in a natural conversation with another. These utterances were from either right before a turn exchange (i.e., when the current talker would finish and the other would begin) or were utterances where the same talker would continue speaking. In Experiment 1, participants were presented audiovisual, auditory-only and visual-only versions of our stimuli and identified whether a turn exchange would occur or not. We demonstrated that although participants could identify turn exchanges with unimodal information alone, they performed best in the audiovisual modality. In Experiment 2, we presented participants audiovisual turn exchanges where the talker, the listener or both were visible. We showed that participants suffered a cost at identifying turns exchanges when visual cues from the listener were not available. Overall, we demonstrate that although auditory information is sufficient for successful conversation, visual information plays an important role in the overall efficiency of communication.

    Schematic of the three modality conditions. (A) In the AudioVisual condition, participants viewed both the auditory and visual component of each stimulus. (B) In the Auditory-Only condition, the video was removed and participants only heard the auditory component. (C) In the Visual-Only condition, the auditory component was removed. Participants viewed 40 Turn trials and 40 Non-Turn trials in each of the three counterbalanced modality-specific blocks. Following presentation of each trial, participants indicated using a ‘Yes’ or ‘No’ response whether the talker had finished their turn.

    Average Turn and Non-Turn accuracy. Overall, participants displayed best performance in the audiovisual condition compared to both unimodal conditions. Participants in the visual-only condition performed significantly worse at Non-Turn perception compared to Turn perception.

    Schematic of the three viewing conditions. All conditions contained auditory information. (A) In the Talker–Listener conditions, participants viewed both the talker and the listener. (B) In the Listener-Only condition, participants could not see the talker. (C) In the Talker-Only condition, participants could not see the listener. Participants viewed 40 Turn trials and 40 Non-Turn trials in each of the three counterbalanced viewing-condition blocks. Following presentation of each trial, participants indicated using a ‘Yes’ or ‘No’ response whether the talker had finished their turn.

    Average Turn and Non-Turn Accuracy. Overall, when participants could not see the listener (Talker-Only condition), they performed worse than both Talker–Listener and the Listener-Only conditions. When only partial information was visible, participants’ performance in Non-Turn perception was affected.

