Assessing audiovisual saliency and visual-information content in the articulation of consonants and vowels on audiovisual temporal perception

in Seeing and Perceiving
Restricted Access
Get Access to Full Text

Have an Access Token?



Enter your access token to activate and access content online.

Please login and go to your personal user account to enter your access token.



Help

Have Institutional Access?



Access content through your institution. Any other coaching guidance?



Connect

Research has revealed different temporal integration windows between and within different speech-tokens. The limited speech-tokens tested to date has not allowed for the proper evaluation of whether such differences are task or stimulus driven? We conducted a series of experiments to investigate how the physical differences associated with speech articulation affect the temporal aspects of audiovisual speech perception. Videos of consonants and vowels uttered by three speakers were presented. Participants made temporal order judgments (TOJs) regarding which speech-stream had been presented first. The sensitivity of participants’ TOJs and the point of subjective simultaneity (PSS) were analyzed as a function of the place, manner of articulation, and voicing for consonants, and the height/backness of the tongue and lip-roundedness for vowels. The results demonstrated that for the case of place of articulation/roundedness, participants were more sensitive to the temporal order of highly-salient speech-signals with smaller visual-leads at the PSS. This was not the case when the manner of articulation/height was evaluated. These findings suggest that the visual-speech signal provides substantial cues to the auditory-signal that modulate the relative processing times required for the perception of the speech-stream. A subsequent experiment explored how the presentation of different sources of visual-information modulated such findings. Videos of three consonants were presented under natural and point-light (PL) viewing conditions revealing parts, or the whole, face. Preliminary analysis revealed no differences in TOJ accuracy under different viewing conditions. However, the PSS data revealed significant differences in viewing conditions depending on the speech token uttered (e.g., larger visual-leads for PL-lip/teeth/tongue-only views).

Assessing audiovisual saliency and visual-information content in the articulation of consonants and vowels on audiovisual temporal perception

in Seeing and Perceiving

Information

Content Metrics

Content Metrics

All Time Past Year Past 30 Days
Abstract Views 21 21 6
Full Text Views 31 31 22
PDF Downloads 3 3 1
EPUB Downloads 0 0 0