Capacity limitations of attentional resources allow only a fraction of sensory inputs to enter our awareness. Most prominently, in the attentional blink, the observer fails to detect the second of two rapidly successive targets that are presented in a sequence of distractor items. This study investigated whether phonological (in)congruency between visual target letters and spoken letters is modulated by subjects’ awareness. In a visual attentional blink paradigm, subjects were presented with two visual targets (buildings and capital Latin letters, respectively) in a sequence of rapidly presented distractor items. A beep was presented always with T1. We manipulated the presence/absence and phonological congruency of the spoken letter that was presented concurrently with T2. Subjects reported the identity of T1 and T2 and reported the visibility of T2. Behaviorally, subjects correctly identified T2 when it was reported to be either visible or unsure, while performances were below chance level when T2 was reported to be invisible. At the neural level, the anterior cingulate was activated for invisible > unsure > visible T2. In contrast, visible relative to invisible trials increased activation in bilateral cerebellum, pre/post-central gyri extending into parietal sulci and bilateral inferior occipital gyri. Incongruency effects were observed in the left inferior frontal gyrus, caudate nucleus and insula only for visible stimuli. In conclusion, phonological incongruency is processed differently when subjects are aware of the visual stimulus. This indicates that multisensory integration is not automatic but depends on subjects’ cognitive state.