The brain should integrate sensory inputs only when they emanate from a common source and segregate those from different sources. Sensory correspondences are important cues informing the brain whether two sensory inputs are generated by a common event and should hence be integrated. Most prominently, sensory inputs should co-occur in time and space. More complex audiovisual stimuli may also be congruent in terms of semantics (e.g., objects and source sounds) or phonology (e.g., spoken and written words; linked via common linguistic labels). Surprisingly, metaphoric relations (e.g., pitch and height) have also been shown to influence audiovisual integration. The neural mechanisms that mediate these metaphoric congruency effects are only poorly understood. They may be mediated via (i) natural multisensory binding, (ii) common linguistic labels or (iii) semantics. In this talk, we will present a series of studies that investigate whether these different types of audiovisual correspondences are processed by distinct neural systems. Further, we investigate how those systems are employed by metaphoric audiovisual correspondences. Our results demonstrate that different classes of audiovisual correspondences influence multisensory integration at distinct levels of the cortical hierarchy. Spatiotemporal incongruency is detected already at the primary cortical level. Natural (e.g., motion direction) and phonological incongruency influences MSI in areas involved in motion or phonological processing. Critically, metaphoric interactions emerge in neural systems that are shared with natural and semantic incongruency. This activation pattern may reflect the ambivalent nature of metaphoric audiovisual interactions relying on both natural and semantic correspondences.