Crossmodal correspondences refer to the systematic associations often found across seemingly unrelated sensory features from different sensory modalities. Such phenomena constitute a universal trait of multisensory perception even in non-human species, and seem to result, at least in part, from the adaptation of sensory systems to natural scene statistics. Despite recent developments in the study of crossmodal correspondences, there are still a number of standing questions about their definition, their origins, their plasticity, and their underlying computational mechanisms. In this paper, I will review such questions in the light of current research on sensory cue integration, where crossmodal correspondences can be conceptualized in terms of natural mappings across different sensory cues that are present in the environment and learnt by the sensory systems. Finally, I will provide some practical guidelines for the design of experiments that might shed new light on crossmodal correspondences.
For more than a century now, researchers have acknowledged the existence of crossmodal congruency effects between dimensions of sensory stimuli in the general (i.e., non-synesthetic) population. Such phenomena, known by a variety of terms including ‘crossmodal correspondences’, involve individual stimulus properties, rely on a crossmodal mapping of unisensory features, and appear to be shared by the majority of individuals. Over the last few years, a number of studies have shed light on many key aspects of crossmodal correspondences, ranging from their role in multisensory integration, their developmental trajectories, their occurrence in non-human mammals, their neural underpinning and the role of learning. I will present a brief overview of the latest findings on crossmodal correspondences, highlight standing questions and provide direction for future research.
Sensory information is inherently ambiguous, and a given signal can in principle correspond to infinite states of the world. A primary task for the observer is therefore to disambiguate sensory information and accurately infer the actual state of the world.
Here, we take the stream–bounce illusion as a tool to investigate perceptual disambiguation from a cue-integration perspective, and explore how humans gather and combine sensory information to resolve ambiguity.
In a classification task, we presented two bars moving in opposite directions along the same trajectory meeting at the centre. We asked observers to classify such ambiguous displays as streaming or bouncing. Stimuli were embedded in dynamic audiovisual noise, so that through a reverse correlation analysis, we could estimate the perceptual templates used for the classification. Such templates, the classification images, describe the spatiotemporal statistical properties of the noise, which are selectively associated to either percept. Our results demonstrate that the features of both visual and auditory noise, and interactions thereof, strongly biased the final percept towards streaming or bouncing.
Computationally, participants’ performance is explained by a model involving a matching stage, where the perceptual systems cross-correlate the sensory signals with the internal templates; and an integration stage, where matching estimates are linearly combined to determine the final percept. These results demonstrate that observers use analogous MLE-like integration principles for categorical stimulus properties (stream/bounce decisions) as they do for continuous estimates (object size, position, etc.).
Finally, the time-course of the classification images reveal that most of the decisional weight for disambiguation is assigned to information gathered before the physical crossing of the stimuli, thus highlighting a predictive nature of perceptual disambiguation.
The association between auditory pitch and spatial elevation is one the most fascinating examples of cross-dimensional mappings: in a wide range of cognitive, perceptual, attentional and linguistic tasks, humans consistently display a positive, sometimes absolute, association between auditory pitch and spatial elevation. However, the origins of such a pervasive mapping are still largely unknown.
Through a combined analysis of environmental sounds and anthropometric measures, we demonstrate that, statistically speaking, this mapping is already present in both the distal and the proximal stimulus. Specifically, in the environment, high sounds are more likely to come from above; moreover, due to the filtering properties of the external ear, sounds coming from higher elevations have more energy at high frequencies.
Next, we investigated whether the internalized mapping depends on the statistics of the proximal, or of the distal stimulus. In a psychophysical task, participants had to localize narrow band-pass noises with different central frequencies, while head- and world-centred reference frames were put into conflict by tilting participants’ body orientation. The frequency of the sounds systematically biased localization in both head- and world-centred coordinates, and, remarkably, in agreement with the mappings measured in both the distal and proximal stimulus.
These results clearly demonstrate that the cognitive mapping between pitch and elevation mirror the statistical properties of the auditory signals. We argue that, in a shorter time-scale, humans learn the statistical properties auditory signals; while, in a longer timescale, the evolution of the acoustic properties of the external ear itself is shaped by the statistics of the acoustic environment.
Humans are equipped with multiple sensory channels that provide both redundant and complementary information about the objects and events in the world around them. A primary challenge for the brain is therefore to solve the ‘correspondence problem’, that is, to bind those signals that likely originate from the same environmental source, while keeping separate those unisensory inputs that likely belong to different objects/events. Whether multiple signals have a common origin or not must, however, be inferred from the signals themselves through a causal inference process.
Recent studies have demonstrated that cross-correlation, that is, the similarity in temporal structure between unimodal signals, represents a powerful cue for solving the correspondence problem in humans. Here we provide further evidence for the role of the temporal correlation between auditory and visual signals in multisensory integration. Capitalizing on the well-known fact that sensitivity to crossmodal conflict is inversely related to the strength of coupling between the signals, we measured sensitivity to crossmodal spatial conflicts as a function of the cross-correlation between the temporal structures of the audiovisual signals. Observers’ performance was systematically modulated by the cross-correlation, with lower sensitivity to crossmodal conflict being measured for correlated as compared to uncorrelated audiovisual signals. These results therefore provide support for the claim that cross-correlation promotes multisensory integration. A Bayesian framework is proposed to interpret the present results, whereby stimulus correlation is represented on the prior distribution of expected crossmodal co-occurrence.
Our body is made of flesh and bones. We know it, and in our daily lives all the senses — including touch, vision, and audition — constantly provide converging information about this simple, factual truth. But is this necessarily always the case? Here we report a surprising bodily illusion demonstrating that human observers rapidly update their assumptions about the material qualities of their body, based on their recent multisensory perceptual experience. To induce an illusory misperception of the material properties of the hand, we repeatedly gently hit participants’ hand, while progressively replacing the natural sound of the hammer against the skin with the sound of a hammer hitting a piece of marble. After five minutes, the hand started feeling stiffer, heavier, harder, less sensitive, and unnatural, and showed enhanced Galvanic skin response to threatening stimuli. This novel bodily illusion, the ‘Marble-Hand Illusion’, demonstrates that the experience of the material of our body, surely the most stable attribute of our bodily self, can be quickly updated through multisensory integration.