Synesthesia as (Multimodal) Mental Imagery

It has been repeatedly suggested that synesthesia is intricately connected with unusual ways of exercising one’s mental imagery, although it is not always entirely clear what the exact connection is. My aim is to show that all forms of synesthesia are forms of (often very different kinds of) mental imagery and, further, if we consider synesthesia to be a form of mental imagery, we get signiﬁcant explanatory beneﬁts, especially concerning less central cases of synesthesia where the inducer is not sensory stimulation.


Introduction
Synesthesia is not a monolithic phenomenon. Some synesthetes hear a musical note and experience it as having a specific color (Ward et al., 2006). Some others experience a specific color each time they see a specific black numeral or letter on white background (Jonas et al., 2011;Mroczko et al., 2009;Sagiv et al., 2006). Synesthesia comes in various different forms: lexical-gustatory synesthesia (strong taste experiences when looking at letters, see Jones et al., 2011;Ward and Simner, 2003), colored touch synesthesia (color experiences when touching different things, see Ludwig and Simner, 2013) spatial time units synesthesia (spatial experience when thinking about time units like the days of the week or the months of the year, see Brang et al., 2011;Jarick et al., 2011;Smilek et al., 2007). The list could go on.
Given the diversity of phenomena referred to as synesthesia, there are some definitional issues here. Synesthesia has been defined as "stimulation of one sensory domain leading to a perception in another sensory domain" (Harrison and Baron-Cohen, 1997), "the elicitation of perceptual experiences in the absence of the normal sensory stimulation" (Ward and Mattingley, 2006) or "stimulation in one sensory or cognitive streams [that] leads to associated experiences in a second unstimulated stream" (Simner, 2012). These definitions pick out slightly different sets of phenomena, but I will argue that they could all be considered to be forms of mental imagery. This is not an entirely new claim: it has been repeatedly suggested that synesthesia is intricately connected with unusual ways of exercising one's mental imagery, although it is not always entirely clear what the exact connection is. My aim is to show that all forms of synesthesia are forms of (often very different kinds of) mental imagery and, further, if we consider synesthesia to be a form of mental imagery, we get significant explanatory benefits (especially concerning less central cases of synesthesia).
Synesthesia research makes a distinction between two kinds of synesthetes: associators and projectors (Dixon et al., 2004, see also Ward et al., 2007 for finer distinctions between surface-projectors, space-projectors, see-associators and know-associators and Edquist et al., 2006 for some further wrinkles). When associators see letters printed in black, they associate colors, but they do not experience the colors of these letters 'out there' in their egocentric space. Nor do they experience the letters as having this specific color. Projectors, in contrast, do seem to see colors located where (or sometimes close to where) the black letters are located. While on the face of it associators (but not projectors) could be thought of as having color mental imagery, I will argue that all instances of synesthesia in fact count as a form of (multimodal) mental imagery.
The plan of the paper is the following. In Section 2, I outline the concept of mental imagery and multimodal mental imagery and then in Section 3, I argue that empirical findings of very different kinds all support the claim that synesthesia is a form of mental imagery. I then outline how such an explanation of synesthesia in terms of mental imagery would work in multimodal (Section 4) and unimodal (Section 5) cases. Finally, in Section 6, I argue that taking synesthesia to be a form of mental imagery has significant explanatory advantages when it comes to non-perceptually triggered forms of synesthesia.

Mental Imagery
Mental imagery, as psychologists and neuroscientists understand the concept, is early perceptual processing not triggered by corresponding sensory stimulation in the relevant sense modality (Kosslyn et al., 2006;Nanay, 2010Nanay, , 2015Nanay, , 2016aNanay, , b, 2018Pearson et al., 2015). So visual imagery is early visual processing (say, in the primary visual cortex) not triggered by corresponding sensory stimulation in the visual sense modality (that is, not triggered by corresponding retinal input).
Mental imagery is defined negatively: it is early perceptual processing not triggered by corresponding sensory stimulation in the relevant sense modality. But then what is it triggered by? It can be triggered in a purely top-down manner, say, when you close your eyes and visualize an apple. But it can also be triggered laterally by other sense modalities. This is multimodal mental imagery. Multimodal mental imagery is early perceptual processing in one sense modality triggered by sensory stimulation in a different sense modality.
So if you have early auditory processing that is triggered by sensory stimulation in the visual sense modality, this counts as multimodal mental imagery. This is what happens, when, for example, you watch the TV muted (e.g., Calvert et al., 1997;Hertrich et al., 2011;Pekkola et al., 2005). The auditory mental imagery will very much depend on bottom-up factors like the lip movements of the person on the screen. But not only these. If this person is someone you know or have heard speak, your auditory mental imagery will be influenced by this information. If it is Barack Obama (someone you have, presumably, heard before), you will 'hear' him speaking with his distinctive tone of voice or intonation, for example. This demonstrates nicely the importance of top-down influences on multimodal mental imagery. Conversely, if you have early visual processing that is triggered by sensory stimulation in another sense modality, this also counts as multimodal mental imagery.
Another example of multimodal mental imagery (which I will come back to) is the double flash illusion. You are presented with one flash and two beeps simultaneously (Shams et al., 2000). So, the sensory stimulation in the visual sense modality is one flash. But you experience two flashes and already in the primary visual cortex, two flashes are processed (Watkins et al., 2006). This means that the double flash illusion is really about multimodal mental imagery: in the case of the second flash, we have perceptual processing in the visual sense modality (again, already in V1) that is not triggered by corresponding sensory stimulation in the visual sense modality (but by sensory stimulation in the auditory sense modality).
I need to make some clarificatory remarks about the concept of multimodal mental imagery. First, not everybody will agree with my use of the term 'mental imagery' (which I borrow from the standard usage in psychology and neuroscience). Nothing depends on the label 'mental imagery' in the argument that follows. Those who have very strong views about how this concept should or should not be used should read the rest of the argument to be about mental imagery* (which is defined as early perceptual processing not triggered by corresponding sensory stimulation in the relevant sense modality).
Second, mental imagery may or may not be voluntary. When you close your eyes and visualize an apple, this is an instance of voluntary mental imagery. But not all mental imagery is voluntary. Unwanted flashbacks to an unpleasant scene or earworms in the auditory sense modality would be examples for involuntary mental imagery.
Third, mental imagery may or may not localize its object in one's egocentric space. When we visualize an apple, we often do so in such a way that the apple is represented in some kind of abstract visualized space, so that it would make little sense to ask whether you could reach the apple or how far the apple is from the tip of your nose. But this is, again, not a necessary feature of mental imagery. You can also visualize an apple on the keyboard of your computer.
Fourth, visualizing an apple is not normally accompanied by any feeling of presence. You are not fooled by this mental imagery into thinking that there is actually an apple in front of you so that you could reach out and grab it. But, again, this is not a necessary feature of mental imagery. There is no prima facie reason why mental imagery could not be accompanied by the feeling of presence. In fact, lucid dreaming, which is widely considered to be a form of mental imagery (see Hobbes, 1654;Walton, 1990 for a summary), is very much accompanied by the feeling of presence.
Finally, and most controversially, perceptual processing may be conscious or unconscious. We know from many many experimental studies that perception, that is, sensory stimulation-driven perceptual processing, can be unconscious (Goodale and Milner, 2004;Kentridge et al., 1999;Kouider and Dehaene, 2007;Weiskrantz, 2009), so there is no prima facie reason why mental imagery, that is, nonsensory stimulation-driven perceptual processing, would have to be conscious. Just as perceptual processing in general, perceptual processing that is not triggered by corresponding sensory stimulation in the relevant sense modality may also be conscious or unconscious (see Nanay, 2018, in press a, in press b).

Synesthesia versus Mental Imagery
Let us return to the definition of synesthesia in Simner (2012): "stimulation in one sensory or cognitive streams [that] leads to associated experiences in a second unstimulated stream". In other words, if synesthetes have a color experience when hearing a certain pitch, there will be perceptual processing in your visual sense modality . Crucially from our point of view, this perceptual processing happens very early on, in most cases in the primary or secondary visual or auditory cortex (see, e.g., Hubbard et al., 2005;Jones et al., 2011;Nunn et al., 2002). As this early perceptual processing is not triggered by corresponding sensory stimulation, this is an instance of mental imagery.
Nonetheless, not everyone agrees that synesthesia is a form of mental imagery. The synesthetic experiences of projectors are routinely characterized as different from mental imagery. For example, some (e.g., Deroy and Spence, 2013) claim that the synesthetic experience of projectors is not mental imagery on the basis of introspective reports of projectors as they say that visualizing feels different from synesthetic experience. Others (e.g., Craver-Lemley and Reeves, 2013) take synesthesia to be different from mental imagery because they take mental imagery to be necessarily voluntary. We have seen that mental imagery can be involuntary and that different forms of mental imagery can 'feel' very different. In other words, the disagreement between these accounts of synesthesia and mine is merely verbal.
More generally, there have been intense debates about just what kind of experience synesthetic experience is. Is it a form of perceptual experience (Cohen, 2017;Matthen, 2017)? Is it a form of hallucination (Fish, 2010)? Or is it some kind of higher level, cognitive/linguistic experience (Simner, 2007)? The problem is that synesthesia does not really seem to fit squarely in any of these categories.
The default position about synesthetic experience, which could be described as the perceptual view, is that synesthetic experiences are perceptual experiences -maybe somewhat unusual perceptual experiences (see the definitions above from Baron- Cohen, 1997 andWard and, which explicitly talk about synesthetic experiences as perceptual experiences and see also Matthen, 2017 andCohen, 2017 for summaries). How is the account I am defending here different from the perceptual view? In some ways, the difference is merely terminological, inasmuch as mental imagery is explicitly defined as perceptual processing that is not triggered by corresponding sensory stimulation. In other words, if synesthesia is a form of mental imagery, it is thereby a form of perceptual processing (one that is not triggered by corresponding sensory stimulation). Hence, the mental imagery view would be consistent with at least some proposals, according to which synesthetic experience is perceptual experience. It would be consistent with Matthen's view, for example, according to which perceptual experience is "accurate imagistic representation of some occurrence in the world that the subject understands as such" (Matthen, 2017, p. 166).
So everybody agrees that synesthetic experience is brought about by perceptual processing. But there is a major distinction between perceptual processing that is triggered by corresponding sensory stimulation and perceptual processing that is not triggered by corresponding sensory stimulation. I call the former 'sensory stimulation-driven perception' and the latter 'mental imagery'. As perception (as contrasted with perceptual processing) has been widely taken to entail sensory stimulation-driven perceptual processing (after all, it is the causal link via sensory stimulation that ensures the causal connection to the world, which is an essential feature of perception), taking synesthesia to be a result of not just perceptual processing, but sensory stimulation-driven perceptual processing has been the mainstream. In contrast, I will argue that synesthetic experience is not sensory stimulation-driven perception, but mental imagery. Further, taking synesthetic experience to be mental imagery (rather than sensory stimulation-driven perception) has major explanatory benefits.
In other words, considering synesthesia to be a form of mental imagery is not a merely verbal move, but it has important explanatory consequences. By pinpointing that synesthesia is a very specific kind of perceptual process, namely, one that is not triggered by corresponding sensory stimulation in the relevant sense modality, my account provides an explanatorily unified account of synesthesia, which explains the experiences of both projectors and associators as well as less central cases of synesthesia (where the inducer is not sensory stimulation-driven) as instances of mental imagery.
Here are some further reasons to think that synesthesia is a form of mental imagery. Synesthetes across the board (both associators and projectors) have more vivid mental imagery than nonsynesthetes (Amsel et al., 2017;Eagleman, 2009;Meier and Rothen, 2013;Price, 2009a, b; but see also Grossenbacher and Lovelace, 2001;Simner, 2013 for some wrinkles and exceptions). And this difference is modality-specific -so lexical gustatory synesthesia subjects have more vivid gustatory mental imagery (but not necessarily more vivid mental imagery in the, say, auditory sense modality; Spiller et al., 2015). Further, synesthesia is very rare among aphantasia subjects (who have no or hardly any mental imagery) and relatively frequent among hyperphantasia subjects (who have very vivid mental imagery) (Zeman et al., 2015).
Some instances of synesthesia are multimodal -for example the pitch and color synesthesia I started the paper with. Some other instances of synesthesia are unimodal -for example the grapheme-color synesthesia (of having colored mental imagery of numerals or letters), which seems to be the most widespread form of this condition.

Multimodal Cases
I shall start with multimodal cases. Hearing a certain pitch and having visual synesthetic experience of a certain color is a clear case of multimodal mental imagery: the perceptual processing in the visual sense modality is triggered by sensory stimulation in the auditory sense modality. And seeing a letter and having the gustatory synesthetic experience of a flavor is also a clear case of multimodal mental imagery (where the perceptual processing in the gustatory sense modality is triggered by the sensory stimulation in the visual sense modality).
The question is then how this differs from other cases of multimodal mental imagery (like the example of watching Obama's speech on TV muted). The difference is that in nonsynesthetic cases of multimodal mental imagery, the crossmodal activation is explained by previous exposure or top-down influences. You 'hear' Obama's distinctive tone of voice when you watch his speech muted because you have on previous occasions heard him and saw him (on TV, presumably) at the same time (see Teufel and Nanay, 2017). So, when you now only have access to the visual part of this familiar multisensory event, you fill in the familiar auditory part of it.
In the case of synesthesia, the crossmodal activation is not explained by previous exposure. When you see the color purple each time you hear the note of high C, this is not explained by your previous exposure to purple high Cs in the past. Purple high Cs are not familiar multisensory events that you have encountered many times in the past. But that is the only difference between the synesthetic and the nonsynesthetic forms of multimodal mental imagery.
And this is true not only of associators (who often report something like involuntary visualizing experiences) but also of projectors (who do not). The self-reports of many projectors indicate that they take themselves to literally see the color of musical notes. Nonetheless, given that the visual perceptual processing of the color is not triggered by corresponding visual sensory stimulation (but rather by auditory sensory stimulation), it counts as mental imagery, not perception. The fact that synesthesia subjects can mistake one for the other indicates that the mental imagery involved in synesthesia comes with the feeling of presence (like many forms of mental imagery, as we have seen). So, the difference between projectors and associators is merely a familiar difference between different forms of mental imagery -for example, whether it localizes its object in one's egocentric space or not. As we have seen in Section 2, this is an important distinction between different instances of mental imagery and this distinction also applies within the domain of synesthetic experiences, where one standard way of describing the difference between projectors and associators is that the formers experience the concurrent as located in the subjects egocentric space, whereas the latter does not (Eagleman et al., 2007). Another influential way of keeping the experiences of projectors and associators apart is to ask whether these experiences are accompanied by the feeling of presence or not (on the role of the feeling of presence in various forms of synesthesia, see Seth, 2014;van Leeuwen et al., 2011). Again, as we have seen in Section 2, some instances of mental imagery are accompanied by the feeling of presence, whereas others are not. Ditto for synesthetic experiences, where this distinction may mark the distinction between projectors and associators. In fact, the experience of projectors is in some ways more similar to other ways of exercising multimodal mental imagery (like the Obama speech case) in terms of the feeling of presence.
Without taking sides in the complex debates about the phenomenology of projectors and associators (and, again, acknowledging that neither of these are monolithic categories, see Ward et al., 2007 for finer distinctions between surface-projectors, space-projectors, see-associators and know-associators, etc.), the crucial point is that standard distinctions between different forms of mental imagery can help us understand the difference between projectors and associators.
This explanation puts synesthesia on a continuum with other forms of multimodal mental imagery, ones we experience all the time. And this way of thinking about synesthesia is consistent with a recent set of findings, which shows that synesthesia can be artificially induced in about half of nonsynesthetes with only 5 min of sensory deprivation (Nair and Brang, 2019). When people who have never experienced synesthesia before are cut off from any kind of sensory stimulation for only 5 min, the result is that coming out of sensory deprivation, more than half of them experience some form of synesthesia (see also Brogaard and Gatzia, 2016 for other examples of artificially induced synesthesia).
If we consider synesthesia to be a form of mental imagery, these findings should not come as a surprise. We know that sensory deprivation induces perceptual processes that are not triggered by corresponding sensory stimulation, because the subjects get no sensory stimulation whatsoever and because the perceptual system keeps on functioning even in the absence of any stimulation (see Berkes et al., 2011). And these perceptual processes that are not triggered by corresponding sensory stimulation (i.e., this mental imagery) explains why subjects subsequently tend to have synesthetic experiences (i.e., mental imagery). If we take synesthetic experiences to be plain stimulus-driven perceptual experiences, no such explanation is available. In other words, taking synesthesia to be a form of mental imagery has some immediate explanatory benefits. More (maybe less immediate) explanatory benefits will be highlighted in Section 6.

Unimodal Cases
The account I outlined in the previous section is only applicable to multimodal cases of synesthesia But how can we then explain the more widespread unimodal cases of synesthesia, like the most widespread form, grapheme-color synesthesia?
A straightforward way of extending this account of multimodal synesthesia is to say that just as multimodal cases of synesthesia happen when perceived properties across sense modalities are bound to a multisensory individual in unusual ways, unimodal cases of synesthesia happen when perceived (or perceptually processed) properties in one sense modality are bound to a unimodal sensory individual in unusual ways. So our perceptual system binds shape, size and color properties to the same unimodal, say, visual, sensory individual. And the reason for this is that most objects we see tend to have shape, size and color. When we see a banana, for example, our perceptual system tends to attribute properties of all these three kinds to it (that is, shape, size and color). But the perceptual system of some people binds color properties to unimodal sensory individuals in a way that does not correspond to past exposure to unimodal sensory individuals of this kind. Bananas tend to be yellow, so having a yellow color mental imagery when presented with a grayscale picture of a banana is something we should expect as long as we have been exposed to yellow bananas in the past. But the grapheme W does not tend to have a specific color and, crucially, our past exposure to the grapheme W is not systematically also an exposure of, say, the color purple. So given the lack of the past exposure of purple Ws, it is surprising that our visual system would complete Ws with the mental imagery of the color purple. So, just as we complete multisensory individuals, we also complete unimodal sensory individuals.
I should acknowledge that this is not intended to be a full explanation of all aspects of synesthesia. I did not say anything about what causes some people and not others to bind properties to these very unusual (multi)sensory individuals. But clarifying how exactly the mental states of synesthesia subjects differ from the mental states of other subjects (that is, in the (multi)sensory individuals they bind properties perceptually to) should be an important step towards such full explanation.
Another important consequence of my claim concerns sensory substitution. Sensory substitution is a procedure that aims to help blind subjects. A camera is installed on the subject's head, which records the scene in front of her and transforms the input in real time into either tactile or auditory stimuli. The subject, after a bit of practice, begins to navigate her environment successfully. It has been suggested that sensory substitution is a form of synesthesia (Proulx and Stoerig, 2006;Ward, 2013;Ward and Meijer, 2010;Ward and Wright, 2012; but see Farina, 2013 for criticism).
We can make sense of this suggestion without being forced to take synesthetic experiences to be similar to the experience of sensory-substituted vision (which seems very different indeed) inasmuch as in my framework both count as (multimodal) mental imagery: both as explained by early cortical perceptual processing in one sense modality that is triggered by sensory stimulation in another sense modality. We have seen why synesthesia counts as mental imagery in this sense and it is easy to see that sensory substitution is also a form of mental imagery, where tactile sensory stimulation triggers visual mental imagery (that is, early visual processing in the visual cortices, see Murphy et al., 2016;Renier et al., 2005; see Nanay, 2017 for a summary). The resulting experiences are very different (as multimodal mental imagery may manifest in very different experiences).

Explanatory Advantages
I said that taking synesthesia to be a form of (multimodal) mental imagery is not a merely verbal move. It is not just relabeling a familiar phenomenon. Taking synesthesia to be a form of mental imagery can help us understand how synesthetic experience can be triggered in various nonsensory ways.
In the cases of synesthesia I discussed so far, the synesthetic experience in a specific sense modality is triggered by sensory stimulation (in the multimodal case, by sensory stimulation in a different sense modality). But as it turns out, synesthetic experience can be induced without any sensory stimulation. And, crucially, synesthetic experience in one sense modality can be induced by mental imagery in another sense modality (by sensorily imagining something, for example, see Spiller and Jansari, 2008;Spiller et al., 2015).
In other words, it is not only, say, an auditory stimulus that can lead to visual synesthetic experience. Auditory mental imagery can also lead to visual synesthetic experience. Early cortical activation in one 'sensory stream' can trigger synesthetic experiences in a different 'sensory stream' regardless of whether this early cortical activation is triggered in a bottom-up manner (by straightforward perceptual input) or in a top-down manner (by perceptually imagining something).
Another example of multimodal mental imagery serving as the inducer of synesthetic experiences in the absence of sensory stimulation in the given sense modality comes from grapheme-color synesthetes who can have vivid color experiences by only touching the graphemes (not seeing them) (Newell, 2013). This is a puzzling piece of finding on the face of it, but can be explained in a straightforward manner in the present framework: this is yet another instance of synesthetic experience in one sensory stream triggered by mental imagery in another sensory stream (where this mental imagery is crossmodally triggered by tactile stimulus). The tactile stimulus triggers visual mental imagery of the grapheme and then the visual mental imagery of the grapheme induces the synesthetic experience of color. If we take the concurrent to be mental imagery, this diverse set of synesthetic experiences can all be explained in terms of an early cortical to early cortical influence. Even more importantly, if we consider synesthesia to be sensory stimulation-driven perception, these well-documented forms of synesthesia will not count as synesthesia at all. Taking synesthesia to be mental imagery allows us to explain important, but less central cases of synesthesia as synesthesia.
Another explanatory perk of taking this route comes from some seemingly odd cases of synesthetic experiences, where the trigger is neither sensory stimulation-driven perception nor perceptual imagining but motoric imagining. The most famous example is swimming-style synesthesia: strong color experiences when seeing, thinking about or imagining a swimming stylebreaststroke, crawl, butterfly, etc. (Mroczko-Wąsowicz and Werning, 2012;Nikolić et al., 2011;Rothen et al., 2013).
In the case of swimming-style synesthesia, synesthetic experiences can be triggered in the absence of any kind of perceptual stimulus (it can happen even when your eyes are closed). But then what triggers these experiences? There seem to be two options, both of which would be compatible with the framework according to which synesthesia is a form of mental imagery (again, note that neither of these options are open to those who take synesthesia to be sensory stimulation-driven perception).
The first option is that when you think about, say, breaststroke, you involuntarily visualize a person swimming breaststroke and it is this involuntary visual mental imagery of somebody doing breaststroke that triggers the synesthetic experience (like in the perceptual imagination cases, see Spiller and Jansari, 2008;Spiller et al., 2015).
The other option is that when you think about breaststroke, you have motoric imagery of swimming in breaststroke -you imagine doing the breaststroke. This second option seems to be closer to the subjects' descriptions of their experience. And in this case, it is motor imagery (imagining doing something) that triggers mental imagery in a different 'sensory stream' (in this case, color mental imagery). So this is, just like the previous example, an imagery to imagery influence -but the former imagery is motor imagery (Nanay, in press c).

Conclusion
I argued that if we accept the proposal that synesthesia is a form of mental imagery, we get a form of explanatory unification regarding the diverse triggers of synesthetic experiences inasmuch as all of them can be explained in terms of imagery → imagery interactions. I argued that the concurrent is mental imagery. And this explains why the inducer is often also mental (or maybe motor) imagery.
I want to end with some open questions and debates where taking synesthesia to be a form of mental imagery might help us make some progress. One such open question is about the complex interactions between synesthetic experiences and some crossmodal illusions, like the double-flash illusion (see O'Callaghan, 2017; see also the references in Section 2). While some synesthetes are more likely to be fooled by crossmodal illusions like the double-flash illusion (which itself relies on multimodal mental imagery), other synesthetes are less likely to be fooled by them (and yet others are as susceptible as controls; Brang et al., 2012;Innes-Brown et al., 2011;Neufeld et al., 2012;Newell and Mitchell, 2016). Given that according to my account both synesthesia and the double-flash illusion count as (different) forms of multimodal mental imagery, we could account for these differences in terms of the interaction (and especially the time scale of the interaction, see Neufeld et al., 2012) between the different sense modalities involved. Given the complexity of the issue, this would be the subject of a different paper.