Capacity limitations of attentional resources allow only a fraction of sensory inputs to enter our awareness. Most prominently, in the attentional blink, the observer fails to detect the second of two rapidly successive targets that are presented in a sequence of distractor items. This study investigated whether phonological (in)congruency between visual target letters and spoken letters is modulated by subjects’ awareness. In a visual attentional blink paradigm, subjects were presented with two visual targets (buildings and capital Latin letters, respectively) in a sequence of rapidly presented distractor items. A beep was presented always with T1. We manipulated the presence/absence and phonological congruency of the spoken letter that was presented concurrently with T2. Subjects reported the identity of T1 and T2 and reported the visibility of T2. Behaviorally, subjects correctly identified T2 when it was reported to be either visible or unsure, while performances were below chance level when T2 was reported to be invisible. At the neural level, the anterior cingulate was activated for invisible > unsure > visible T2. In contrast, visible relative to invisible trials increased activation in bilateral cerebellum, pre/post-central gyri extending into parietal sulci and bilateral inferior occipital gyri. Incongruency effects were observed in the left inferior frontal gyrus, caudate nucleus and insula only for visible stimuli. In conclusion, phonological incongruency is processed differently when subjects are aware of the visual stimulus. This indicates that multisensory integration is not automatic but depends on subjects’ cognitive state.
Attention (i.e., task relevance) and expectation (i.e., signal probability) are two critical top-down mechanisms guiding perceptual inference. Attention prioritizes processing of information that is relevant for observers’ current goals. Prior expectations encode the statistical structure of the environment. Research to date has mostly conflated spatial attention and expectation. Most notably, the Posner cueing paradigm manipulates spatial attention using probabilistic cues that indicate where the subsequent stimulus is likely to be presented. Only recently have studies attempted to dissociate the mechanisms of attention and expectation and characterized their interactive (i.e., synergistic) or additive influences on perception. In this review, we will first discuss methodological challenges that are involved in dissociating the mechanisms of attention and expectation. Second, we will review research that was designed to dissociate attention and expectation in the unisensory domain. Third, we will review the broad field of crossmodal endogenous and exogenous spatial attention that investigates the impact of attention across the senses. This raises the critical question of whether attention relies on amodal or modality-specific mechanisms. Fourth, we will discuss recent studies investigating the role of both spatial attention and expectation in multisensory perception, where the brain constructs a representation of the environment based on multiple sensory inputs. We conclude that spatial attention and expectation are closely intertwined in almost all circumstances of everyday life. Yet, despite their intimate relationship, attention and expectation rely on partly distinct neural mechanisms: while attentional resources are mainly shared across the senses, expectations can be formed in a modality-specific fashion.
Introduction: In multistable perception, the brain alternates between several perceptual explanations of ambiguous sensory signals. Recent studies have demonstrated crossmodal interactions between ambiguous and unambiguous signals. However it is currently unknown whether multiple bistable processes can interact across the senses (Conrad et al., ; Pressnitzer and Hupe, ). Using the apparent motion quartet in vision and touch, this study investigated whether bistable perceptual processes for vision and touch are independent or influence each other when powerful cues of congruency are provided to facilitate visuotactile integration (Conrad et al., in press).
Methods: When two visual flashes and/or tactile vibration pulses are presented alternately along the two diagonals of the rectangle, subjects’ percept vacillates between vertical and horizontal apparent motion in the visual and/or tactile modalities (Carter et al., ). Observers were presented with unisensory (visual/tactile), visuotactile spatially congruent and incongruent apparent motion quartets and reported their visual or tactile percepts.
Results: Congruent stimulation induced pronounced visuotactile interactions as indicated by increased dominance times and %-bias for the percept already dominant under unisensory stimulation. Yet, the temporal dynamics did not converge for congruent stimulation. It depended also on subjects’ attentional focus and was generally slower for tactile than visual reports.
Conclusion: Our results support Bayesian approaches to perceptual inference, where the probability of a perceptual interpretation is determined by combining a modality-specific prior with incoming visual and/or tactile evidence. Under congruent stimulation, joint evidence from both senses decelerates the rivalry dynamics by stabilizing the more likely perceptual interpretation. Importantly, the perceptual stabilization was specific to spatiotemporally congruent visuotactile stimulation indicating multisensory rather than cognitive bias mechanisms.
The role attention plays in our experience of a coherent, multisensory world is still controversial. On the one hand, a subset of inputs may be selected for detailed processing and multisensory integration in a top-down manner, i.e., guidance of multisensory integration by attention. On the other hand, stimuli may be integrated in a bottom-up fashion according to low-level properties such as spatial coincidence, thereby capturing attention. Moreover, attention itself is multifaceted and can be described via both top-down and bottom-up mechanisms. Thus, the interaction between attention and multisensory integration is complex and situation-dependent. The authors of this opinion paper are researchers who have contributed to this discussion from behavioural, computational and neurophysiological perspectives. We posed a series of questions, the goal of which was to illustrate the interplay between bottom-up and top-down processes in various multisensory scenarios in order to clarify the standpoint taken by each author and with the hope of reaching a consensus. Although divergence of viewpoint emerges in the current responses, there is also considerable overlap: In general, it can be concluded that the amount of influence that attention exerts on MSI depends on the current task as well as prior knowledge and expectations of the observer. Moreover stimulus properties such as the reliability and salience also determine how open the processing is to influences of attention.
The brain should integrate sensory inputs only when they emanate from a common source and segregate those from different sources. Sensory correspondences are important cues informing the brain whether two sensory inputs are generated by a common event and should hence be integrated. Most prominently, sensory inputs should co-occur in time and space. More complex audiovisual stimuli may also be congruent in terms of semantics (e.g., objects and source sounds) or phonology (e.g., spoken and written words; linked via common linguistic labels). Surprisingly, metaphoric relations (e.g., pitch and height) have also been shown to influence audiovisual integration. The neural mechanisms that mediate these metaphoric congruency effects are only poorly understood. They may be mediated via (i) natural multisensory binding, (ii) common linguistic labels or (iii) semantics. In this talk, we will present a series of studies that investigate whether these different types of audiovisual correspondences are processed by distinct neural systems. Further, we investigate how those systems are employed by metaphoric audiovisual correspondences. Our results demonstrate that different classes of audiovisual correspondences influence multisensory integration at distinct levels of the cortical hierarchy. Spatiotemporal incongruency is detected already at the primary cortical level. Natural (e.g., motion direction) and phonological incongruency influences MSI in areas involved in motion or phonological processing. Critically, metaphoric interactions emerge in neural systems that are shared with natural and semantic incongruency. This activation pattern may reflect the ambivalent nature of metaphoric audiovisual interactions relying on both natural and semantic correspondences.