Seeing the Way: the Role of Vision in Conversation Turn Exchange Perception

in Multisensory Research
Restricted Access
Get Access to Full Text
Rent on DeepDyve

Have an Access Token?

Enter your access token to activate and access content online.

Please login and go to your personal user account to enter your access token.


Have Institutional Access?

Access content through your institution. Any other coaching guidance?


During conversations, we engage in turn-taking behaviour that proceeds back and forth effortlessly as we communicate. In any given day, we participate in numerous face-to-face interactions that contain social cues from our partner and we interpret these cues to rapidly identify whether it is appropriate to speak. Although the benefit provided by visual cues has been well established in several areas of communication, the use of visual information to make turn-taking decisions during conversation is unclear. Here we conducted two experiments to investigate the role of visual information in identifying conversational turn exchanges. We presented clips containing single utterances spoken by single individuals engaged in a natural conversation with another. These utterances were from either right before a turn exchange (i.e., when the current talker would finish and the other would begin) or were utterances where the same talker would continue speaking. In Experiment 1, participants were presented audiovisual, auditory-only and visual-only versions of our stimuli and identified whether a turn exchange would occur or not. We demonstrated that although participants could identify turn exchanges with unimodal information alone, they performed best in the audiovisual modality. In Experiment 2, we presented participants audiovisual turn exchanges where the talker, the listener or both were visible. We showed that participants suffered a cost at identifying turns exchanges when visual cues from the listener were not available. Overall, we demonstrate that although auditory information is sufficient for successful conversation, visual information plays an important role in the overall efficiency of communication.

Seeing the Way: the Role of Vision in Conversation Turn Exchange Perception

in Multisensory Research



BaayenR. H.DavidsonD. J.BatesD. M. (2007). Mixed-effects modeling with crossed random effects for subjects and itemsJ. Mem. Lang. 59390412.

BarkhyusenP.KrahmerE.SwertsM. (2010). Crossmodal and incremental of audiovisual cues to emotional speechLang. Speech 53330.

Baron-CohenS. (1988). Social and pragmatic deficits in autism: cognitive or affective? J. Autism Dev. Disord. 18379402.

BavelasJ.ChovilN.CoatesL.RoeL. (1995). Gestures specialized for dialoguePersonal. Soc. Psychol. Bull. 21394405.

BavelasJ.CoatesL.JohnsonT. (2002). Listener responses as a collaborative process: the role of gazeJ. Comm. 52566580.

BögelsS.MagyariL.LevinsonS. (2015). Neural signatures of response planning occur midway through an incoming question in conversationNat. Sci. Rep. 512881. DOI:10.1038.srep12881.

BrysbaertM. (2007). “Thelanguage-as-fixed-effect fallacy”: some simple SPSS solutions to a complex problem (Version 2.0). Technical Report. Royal Holloway University of London UK.

ClarkH. H. (1973). The language-as-fixed-effect fallacy: a critique of language statistics in psychological researchJ. Verbal Learning Verbal Behav. 12335359.

ClarkH. H.BrennanS. E. (1991). Grounding in communicationPerspect. Soc. Shared Cogn. 13127149.

CoatesJ.Sutton-SpenceR. (2001). Turn-taking patterns in deaf conversationJ. Sociolinguist. 5507529.

CollignonO.GirardS.GosselinF.RoyS.Saint-AmourD.LassondeM.LeporeF. (2008). Audio–visual integration of emotion expressionBrain Res. 1242I126135.

ConreyB.PisoniD. B. (2006). Auditory–visual speech perception and synchrony detection in speech and nonspeech signalsJ. Acoust. Soc. Am. 11940654073.

De GelderB.VroomenJ. (2000). The perception of emotions by ear and by eyeCogn. Emot. 14289311.

De RuiterJ. P.MittererH.EnfieldN. J. (2006). Projecting the end of a speaker’s turn: a cognitive cornerstone of conversationLanguage 82515535.

De VosC.TorreiraF.LevinsonS. C. (2015). Turn-timing in signed conversation: coordinating stroke-to-stroke turn boundarieFront. Psychol. 6268. DOI:10.3389/fpsycg.2015.00268.

DiasJ. W.RosenblumL. D. (2011). Visual influences on interactive speech alignmentPerception 40(12) 14571466.

DuncanS. (1972). Some signals and rules for taking speaking turn in conversationsJ. Pers. Soc. Psychol. 23283292.

ForsterK. I.ForsterJ. C. (2003). DMDX: a Windows display program with millisecond accuracyBehav. Res. Meth. 35(1) 116124.

GarrodS.PickeringM. J. (2004). Why is conversation so easy? Trends Cogn. Sci. 8811.

GoodwinC. (1981). Conversational Organization: Interaction Between Speakers and Hearers. Academic PressCambridge, MA, USA.

GravanoA.HirschbergJ. (2011). Turn-taking cues in task-oriented dialogueComp. Speech Lang. 25601634.

GrosjeanF. (1996). GatingLang. Cogn. Proc. 11597604.

HadarU.SteinerT. J.GrantE. C.RoseF. C. (1984). The timing of shifts of head postures during conversationHum. Mov. Sci. 3237245.

HawkinsK. (1991). Some consequences of deep interruption in task-oriented communicationJ. Lang. Soc. Psychol. 10185203.

HeldnerM.EdlundJ. (2010). Pauses, gaps and overlaps in conversationsJ. Phon. 38555568.

HollerJ.KendrickK. H. (2015). Unaddressed participants’ gaze in multi-person interaction: optimizing recipiencyFront. Psychol. 698. DOI:103389/fpsycg.2015.00098.

IndefreyP.LeveltW. J. (2004). The spatial and temporal signatures of word production componentsCognition 92101144.

KeitelA.DaumM. M. (2015). The use of intonation of turn anticipation in observed conversations without visual signals as source of informationFront. Psychol. 6108. DOI:10.3389/fpsyg.2015.00108.

KendonA. (1967). Some functions of gaze-direction in social interactionActa Psychologica 262263.

KendonA. (1972). Some relationships between body motion and speech in: Studies in Dyadic CommunicationSiegmanA. W.PopeB. (Eds) pp.  177210. Pergamon PressNew York, NY, USA.

KnoblichG.FlachR. (2001). Predicting the effects of actions: interactions of perception and actionPsychol. Sci. 12467472.

KoisoH.HoriuchiY.TutiyaS.IchikawaA.DenY. (1998). An analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese map task dialogsLang. Speech 41295321.

KrautR. E.FussellS. R.SiegelJ. (2003). Visual information as a conversational resources in collaborative physical tasksHum.-Comp. Interact. 181349.

LevinsonS. C. (2016). Turn-taking in human communication — origins and implications for language processingTrends Cogn. Sci. 20614.

MagyariL.De RuiterJ. P. (2012). Prediction of turn-ends based on anticipation of upcoming wordsFront. Psychol. 3376. DOI:103389/fpsyg.2012.00376.

MagyariL.BastiaansenM. C.De RuiterJ. P.LevinsonS. C. (2014). Early anticipation lies behind the speed of response in conversationJ. Cogn. Neurosci. 2625302539.

MassaroD. W. (1998). Perceiving Talking Faces: from Speech Perception to a Behavioral Principle. MIT PressCambridge, MA, USA.

McGurkH.MacDonaldJ. (1976). Hearing lips and seeing voicesNature 264(5588) 746748.

McLaughlinM. L.CodyM. J. (1982). Awkward silences: behavioral antecedents and consequences of the conversational lapseHum. Comm. Res. 8299316.

McNeillD. (1992). Hand and Mind: What Gestures Reveal About Thought. The University of Chicago PressChicago, MA.

MixdorffH.HonemannH.KimJ.DavisC. (2015). Anticipation of turn-switching in auditory-visual dialogs in: Proceedings of 1st Joint Conference on Facial Analysis Animation and Auditory-Visual Speech Processing pp. 52–56. Vienna Austria.

MunhallK. G.JonesJ. A.CallanD. E.KuratateT.Vatikiotis-BatesonE. (2004). Visual prosody and speech intelligibility: head movement improves auditory speech perceptionPsychol. Sci. 15133137.

NenkovaA.GravanoA.HirschbergJ. (2008). High frequency word entrainment in spoken dialogue in: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies pp. 169–172. Columbus OH USA.

OhsugaT.NishidaM.HoriuchiY.IchikawaA. (2005). Investigation of the relationship between turn-taking and prosodic features of spontaneous dialogue in: Interspeech 2005 pp.  3336. Lisbon, Portugal.

RiestC.JorschickA. B.De RuiterJ. P. (2015). Anticipation in turn-taking: mechanisms and information sourcesFront. Psychol. 689. DOI:103389/fpsycg.2015.00089.

RoseD.ClarkeT. J. (2009). Look who’s talking: visual detection of speech from whole-body biological motion cues during emotive interpersonal conversationPerception 38153156.

RosenblumL. D. (2005). Primacy of multimodal speech perception in: The Handbook of Speech PerceptionPisoniD. B.RemezR. E. (Eds) pp.  5178. BlackwellOxford, UK.

RutherfordM. D.KuhlmeierV. A. (2013). Social Perception: Detection and Intepretation of Animacy Agency and Intention. MIT PressCambridge, MA, USA.

SacksH. (1992). Lectures on ConversationVol. 1. BlackwellOxford, UK.

SacksH.SchegloffE. A.JeffersonG. (1974). A simplest systematics for the organization of turn-taking for conversationLanguage 50(4) 696735.

StiversT.EnfieldN. J.BrownP.EnglertC.HayashiM.HeinemannT.HoymannG.RossanoF.De RuiterJ. P.YoonK. E.LevinsonS. C. (2009). Universals and cultural variation in turn-taking in conversationProc. Natl. Acad. Sci. USA 1061058710592.

SumbyW. H.PollackI. (1954). Visual contribution to speech intelligibility in noiseJ. Acoust. Soc. Am. 26212215.

Ten BoschL.OostdijkN.BovesL. (2005). On temporal aspects of turn taking in conversational dialoguesSpeech Comm. 478086.

ThomasA. P.BullP. (1981). The role of pre-speech posture change in dyadic interactionBr. J. Soc. Psychol. 20105111.

TorreiraF.ValterssonV. (2015). Phonetic and visual cues to questionhood in French conversationPhonetica 722042.

VerbruggeR. R. (1985). Language and event perception: steps toward a synthesis in: Persistence and Change: Proceedings of the First International Conference on Event Perception W. H. Warren and R. E. Shaw (Eds) pp. 157–194. Lawrence Erlbaum Associates Publishers Hillsdale NJ USA.


  • View in gallery

    Schematic of the three modality conditions. (A) In the AudioVisual condition, participants viewed both the auditory and visual component of each stimulus. (B) In the Auditory-Only condition, the video was removed and participants only heard the auditory component. (C) In the Visual-Only condition, the auditory component was removed. Participants viewed 40 Turn trials and 40 Non-Turn trials in each of the three counterbalanced modality-specific blocks. Following presentation of each trial, participants indicated using a ‘Yes’ or ‘No’ response whether the talker had finished their turn.

  • View in gallery

    Average Turn and Non-Turn accuracy. Overall, participants displayed best performance in the audiovisual condition compared to both unimodal conditions. Participants in the visual-only condition performed significantly worse at Non-Turn perception compared to Turn perception.

  • View in gallery

    Schematic of the three viewing conditions. All conditions contained auditory information. (A) In the Talker–Listener conditions, participants viewed both the talker and the listener. (B) In the Listener-Only condition, participants could not see the talker. (C) In the Talker-Only condition, participants could not see the listener. Participants viewed 40 Turn trials and 40 Non-Turn trials in each of the three counterbalanced viewing-condition blocks. Following presentation of each trial, participants indicated using a ‘Yes’ or ‘No’ response whether the talker had finished their turn.

  • View in gallery

    Average Turn and Non-Turn Accuracy. Overall, when participants could not see the listener (Talker-Only condition), they performed worse than both Talker–Listener and the Listener-Only conditions. When only partial information was visible, participants’ performance in Non-Turn perception was affected.

Index Card

Content Metrics

Content Metrics

All Time Past Year Past 30 Days
Abstract Views 15 15 4
Full Text Views 14 14 7
PDF Downloads 2 2 0
EPUB Downloads 1 1 0