Exploring Group Differences in the Crossmodal Correspondences

There has been a rapid growth of interest amongst researchers in the cross-modal correspondences in recent years. In part, this has resulted from the emerging realization of the important role that the correspondences can sometimes play in multisensory integration. In turn, this has led to an interest in the nature of any differences between individuals, or rather, between groups of individuals, in the strength and/or consensuality of cross-modal correspondences that may be observed in both neu-rotypically normal groups cross-culturally, developmentally, and across various special populations (including those who have lost a sense, as well as those with autistic tendencies). The hope is that our emerging understanding of such group differences may one day provide grounds for supporting the reality of the various different types of correspondence that have so far been proposed, namely structural, statistical, semantic, and hedonic (or emotionally mediated).


Introduction
The body of evidence supporting the existence of a wide range of cross-modal correspondences between various different attributes, features, and dimensions of sensory experience has expanded rapidly in recent years (e.g., Marks, 1974Marks, , 1978Marks, , 2004Sathian and Ramachandran, 2020;Spence, 2011). Furthermore, the research that has been published to date has also highlighted the influence that a number of such correspondences can sometimes exert over various aspects of multisensory perception (e.g., Parise and Spence, 2009;Spence and Sathian, 2020). It is interesting, therefore, to consider whether there are also any meaningful differences between particular groups of individuals in terms of the strength and/or consensuality of the cross-modal correspondences that have been documented to date. A growing body of empirical research has recently started to highlight various differences between groups of individuals attributable to everything from neurotypical development (e.g., Marks et al., 1987) through to cultural influences (Walker, 1987), and from the consequences of sensory loss (e.g., Hamilton-Fletcher et al., 2018;Sourav et al., 2019; though see also Barilari et al., 2018) through to autism/autistic tendencies (e.g., Chen et al., 2021).
Importantly, however, the question of differences between groups of individuals can potentially be further broken down as a function of the type of cross-modal correspondence under investigation, and perhaps even the particular pair of sensory modalities that happen to be involved. For while the majority of research published to date has tended to focus on audiovisual crossmodal correspondences (e.g., Bonetti and Costa, 2018;Guzman-Martinez et al., 2012;Orchard-Mills et al., 2013; see Spence and Sathian, 2020, for a review), there has also been interest from researchers in those correspondences involving the sense of touch (see Spence, 2020a, for a review) and even the chemical senses (namely olfaction and gustation; see Belkin et al., 1997;Gilbert et al., 1996;Kemp and Gilbert, 1997;Spence, 2020b;Spence and Levitan, 2022). Spence (2011) argued for the existence of four putatively distinct kinds of cross-modal correspondence, namely structural (e.g., Stevens, 1957;Walsh, 2003;cf. Pinel et al., 2004;Yates et al., 2012), statistical (e.g., Wagner et al., 1981), semantic (Gallace and Spence, 2006;Martino and Marks, 1999, 2000, 2001Osgood, 1960;Walker and Walker, 2016), and emotional or hedonic (e.g., Palmer et al., 2013;Spence, 2020c; see Table 1). Structural correspondences are thought to be based on common neural encoding (e.g., of magnitude or intensity; Stevens, 1957). Statistical correspondences are based on associative learning (i.e., they are based on the internalization of the statistical regularities of the multisensory environment such as, for example, the physical correlation between pitch and size). Semantic (or lexical; Spence and Di Stefano, 2022;Walker, 2016) correspondences are based on the same term being used to describe different dimensions of experience (as in the terms 'high' and 'low' being used to describe pitch of sound and elevation; see Martino and Marks, 1999). Finally, mood-based or emotionally-mediated correspondences are thought to be based on matching of the mood, emotion, or hedonic valence associated with each of the component stimuli being similar (see Palmer et al., 2013;Schifferstein and Tanudjaja, 2004;Spence, 2020c). Table 1. Different kinds of crossmodal relation that have been proposed in the literature to date, including four types of crosssmodal correspondence, along with amodal, metaphorical, and crossmodal semantic congruency. Relevant here, Walker-Andrews (1994) distinguished between four kinds of crossmodal mapping: namely those based on amodal specification, such as (in her view) size, shape, texture, and substance (see also Wagner et al., 1981, on the notion of amodal specification); Arbitrary and Artificial correspondences (i.e., the ringtone on one's phone; man-made objects, she notes, mostly fall into this category); Arbitrary but Natural (such as the sound of a person's voice); and Typical intermodal relations, "describing those that are typically in the environment but are only partially specified by amodal invariants -such as heavy objects typically producing deeper and louder sounds on impact than do lighter objects)" (Walker-Andrews, 1994, p. 48). One might wonder whether the latter two differ only in terms of the precision of the crossmodal match, rather than reflecting a difference in kind

Structural
(physiological) Based on neural organization/coding (e.g., of stimulus intensity/ magnitude; Stevens, 1957;Walsh, 2003). Spence and Di Stefano (2022) have argued that physiological may be better than structural, given that the latter term has already been used to describe alignment based on physical similarity of the perceptual dimensions (e.g., hue and pitch both being circular dimensions; e.g., see Sebba, 1991). Statistical Based on the internalization of the statistics of the environment (e.g., pitch-size mapping; e.g., Ernst, 2007;Spence, 2011). That said, Walker-Andrews (1994) distinguishes between physical laws (or Typical intermodal relations) and man-made objects (Arbitrary and Artificial in her terminology). Semantic (lexical) Based on common linguistic terms used to differentiate aspects of sensory experience (e.g., Gallace and Spence, 2006;Martino and Marks, 1999, 2000, 2001Osgood, 1960). Walker and Walker (2012) suggest that it is the lexical, rather than linguistic meaning of the terms that is key to the effects reported in this case (cf. Shayan et al., 2014). Emotionally-mediated (hedonic) Based on emotional, or affective associations of component stimuli, such as when paintings and musical selections matched because they are associated with the same emotion (e.g., Palmer et al., 2013;Schifferstein and Tanudjaja, 2004;Spence, 2020c) Amodal Pick-up of the same environmental information by different senses (e.g., shape information detected by vision and touch; e.g., Gibson, 1969;cf. Lewkowicz and Turkewitz, 1980;Smith, 1994).

Metaphorical
Based on the more abstract mapping of (sensory) features (such as the matching of auditory duration with visual length (Gardner, 1974;Marks et al., 1987;Wagner et al., 1981 -cf. Jalal andRamachandran, 2014). Semantic Based on different unisensory properties that are associated with semantically-meaningful objects or concepts, such as the mapping of a barking sound with the image of a dog (e.g., Barenholtz et al., 2014;Spence, 2010, 2011). Notice how the terminology conflicts with the so-called semantic category (terminology introduced by Marks, 1999, 2000) above.
One might, for example, expect there to be more scope for individual differences in the case of semantic (or lexical) and/or emotional cross-modal correspondences than for those correspondences that happen to be structural, or statistical in origin (at least when the latter happen to be based on the physical regularities of the environment). The hope amongst some researchers is therefore that any regularities in the patterns of differences that may be observed in various groups of participants might one day help to support the reality of the distinctions that have been suggested between the various different classes of cross-modal correspondence (see Smith, 1994). At the same time, however, it should also be acknowledged that there is something of an unresolved question in the literature as to whether amodal attributes and metaphorical mappings should be listed alongside the other cross-modal correspondences (e.g., Smith, 1994;Walker, 1987;Walker-Andrews, 1994), or whether instead they should simply be subsumed within the other categories of cross-modal relation (see Spence et al., 2013, on the problematic notion of amodality in multisensory research).
Structural (or perhaps better said, physiological; Spence and Di Stefano, 2022, on this point), and some statistical correspondences (namely those based on the physical regularities of the environment), such as the pitch-size association, might well be expected to be universal (e.g., Gallace and Spence, 2006;Parise and Spence, 2009; see also Peters et al., 2015;Pisanski et al., 2017). That said, those statistical correspondences that happen to be based on more 'arbitrary' combinations of features (what Walker-Andrews, 1994, has termed 'Arbitrary and Artificial' correspondences), such as, for example, those that have been documented to exist between smell and taste stimuli (e.g., Blank and Mattes, 1990;Spence, 2008), and/or between colour and taste (e.g., Shankar et al., 2010;Spence et al., 2015), or between tastes and visual textures ( Barbosa Escobar et al., 2022) are presumably more likely to show robust cultural variation. Consistent with such a view, Wan et al. (2014) conducted an online study (N = 452) of the nature of any cross-modal correspondences between basic tastes and abstract visual features (specifically hue category and form) in groups of participants from four countries (China, India, Malaysia, and the USA). While the results showed a high degree of cross-cultural similarity, some salient cross-cultural differences were also observed. Indeed, according to the authors: "In terms of the whole picture of color-shape-texture-taste associations, we found the four countries, China, India, Malaysia, and the USA, are quite different from each other" (Wan et al., 2014, p. 9). That being said, the authors were unable to establish any plausible factors underpinning the differences that they reported (such as Eastern or Western culture), or meaningful differences in terms of the particular correspondences under investigation. Levitan et al. (2014) also documented a number of cross-cultural similarities and differences in terms of the odour-colour correspondences in a study that was conducted simultaneously in six different countries/cultural groups (Dutch, Netherlands-residing-Chinese, German, Malay, Malaysian-Chinese, and a group of people living in the USA). The researchers used representational dissimilarity analysis to assess the degree of cross-cultural similarity and difference in the patterns of colour-odour associations that were reported. These researchers found that the German and US participants were most similar in terms of their specific cross-modal correspondences, followed by German and Malay participants. By contrast, the largest group differences in terms of colour-odour cross-modal correspondences that were reported were between the Malay and Netherlands-resident Chinese groups and also between the Dutch and Malaysian-Chinese. Given this pattern of results, the researchers were led to the conclusion that culture (i.e., in terms of different prior patterns of multisensory perceptual experience) played an important role in determining the particular colour-odour cross-modal associations that they observed.
Meanwhile, in a sample of more than 5000 participants from around the world, Velasco and his colleagues were able to demonstrate that, of the six drinks that were shown to participants, the dark red one was rated as looking sweetest by participants (the other drinks had a blue, yellow, green, orangey, and bright red appearance), no matter which continent the participants happened to come from (see Velasco et al., 2016). That said, it should also be noted that the correspondence was far from perfectly consensual, with only 44% of the participants choosing the dark-red drink as the sweetest of the six available options (see Note 1). Furthermore, as soon as colour happens to be present in terms of the appearance properties of a particular food or drink, then the likelihood is that it may well be associated with a specific product or brand (see Shankar et al., 2010). Consequently, in such cases, it can be hard to rule out the possibility that semantic knowledge (regarding the particular configuration of sensory properties that normally co-occur together in a given food product -such as the colour brown, a sweet taste, and vanilla-cinnamon-citrus scent of a cola drink) may play in driving any cross-modal associations that are documented (i.e., rather than necessarily a more abstract, and direct, correspondence between colour and taste/smell; e.g., Spence, 2020b;Spence and Levitan, 2022; see also Shankar et al., 2010;Speed et al., 2021). This may also be expected to introduce a cultural element to the cross-modal correspondences that have been reported (see Walker-Andrews, 1994, on those correspondences that are associated with, or based on, man-made objects). In such cases, the correspondences may be better considered to be semantic in nature (e.g., as in the literature on cross-modal semantic congruency; see Spence, 2010, 2011; rather than semantic cross-modal correspondences; see Martino and Marks, 1999). Here, it is important not to confuse this with semantic cross-modal correspondences suggested by Marks (1999, 2000; see also Spence, 2011). This is one of the reasons why it may make more sense to label the latter as lexical correspondences (see Spence and Di Stefano, 2022;, on this point).
One of the intriguing questions about the cross-modal correspondences (especially for some of the more surprising cross-modal mappings) for which it may be difficult to come up with a statistical explanation (e.g., Sidhu and Pexman, 2018;Spector and Maurer, 2012;Stevenson et al., 2012;Wagner et al., 1981) concerns the consensuality of the mappings that have so far been obtained (Chen et al., 2019;Spence et al., 2015;Woods et al., 2013). According to certain of the research that has been published to date, those cross-modal correspondences that people are themselves more confident about also tend to be the ones that they believe will be shared by others (e.g., see Spence et al., 2015, fig. 4c). Potentially relevant here, Koriat (2008) has introduced the notion of 'consensuality' (to describe the feeling that we often know how other people are likely to respond, and this may be more important to performance than the correct answer, assuming there is one). Others, meanwhile, talk of people's 'metacognitive awareness' (see Chen et al., 2019). That being said (and as we will see later), an individual's feeling regarding the consensuality of a given cross-modal correspondence need not always accurately predict its strength within the general population (see Parise, 2016).
At the same time, however, it is easy to imagine how any semantic or rather lexical correspondences (Spence and Di Stefano, 2022;Walker, 2016) will likely depend on, or at the very least be influenced by, the differing linguistic (Wagner et al., 1981) or lexical (see Walker and Smith, 1984;Walker and Walker, 2012;cf. Glaser and Glaser, 1989) terms that are sometimes used to describe sensory experiences in different languages. One might think here only of the way in which different terms, such as 'thickness' or 'height', are sometimes used to describe the pitch of sounds in different languages (e.g., Dolscheid et al., 2014;Eitan and Timmers, 2010;Parkinson et al., 2012;Shayan et al., 2011). At the same time, however, it is worth considering how there may also be a relevant historical component to a few such semantic correspondences, given the changing meanings, uses, and associations that terms such as 'thick' and 'bright' have had over the decades/centuries in different languages when referring, for example, to the description of sounds (e.g., see Cohen, 1934;Mudge, 1920;Schiller, 1935, on this point).
Beyond this, there is also a sense in which affective, or emotionallymediated, correspondences might well be expected to show more individual, or cultural, variation (e.g., Palmer et al., 2013;Spence, 2020c;Velasco et al., 2015;Wang and Spence, 2017;Wang et al., 2016), given the cultural differences that have been reported in terms of people's affective responses to unisensory stimuli (such as colours and odours; e.g., see Ayabe-Kanamura et al., 1998;Blazhenkova and Kumar, 2018;Jonauskaite et al., 2019, for a couple of representative examples). By contrast, it is easy to imagine how those cross-modal correspondences that happen to be based on the physical regularities of the environment (Evans and Treisman, 2010), such as, for example, the correspondence between pitch and size (Gallace and Spence, 2006;Parise and Spence, 2009), would tend to be universal, given the regularity with which people will presumably be exposed to them. They might perhaps even be shared across species (e.g., see Korzeniowska et al., 2022;Ratcliffe et al., 2016). The auditory pitch-elevation correspondence is presumably also likely to be universal (Parise et al., 2014; see also Rusconi et al., 2006), as would any putatively innate correspondences, such as possibly between visual brightness and auditory pitch (Haryu and Kajikawa, 2012;Ludwig et al., 2011; though see Spence and Deroy, 2012a) (Note 2). Intriguingly, researchers have also raised the possibility that certain of the cross-modal correspondences between auditory pitch (low vs high) and specific taste qualities (namely, bitter and sweet) might be shared universally. This suggestion is based on the different stereotypical orofacial gestures that these basic tastants evoke across species at birth, and the presumed sonic differences in the quality of any utterances that would be made (see Knöferle and Spence, 2012;Spence, 2012).
Interestingly, when Parise and Spence (2012) assessed five different audiovisual cross-modal correspondences (including examples of both verbal and non-verbal sound symbolism), their strength, as measured by the d' score (using a version of the Implicit Association Test; IAT), turned out to be essentially equivalent, regardless of the correspondence being tested (see Fig. 1). By contrast, however, when Parise (2016) took a wide range of different putative correspondences (described verbally) that those he interrogated thought would not exist, he actually found (in an informal survey of 62 people) that many of the mappings did, in fact, show a consensual response mapping (in terms of aligning the ends of the scale). Taken together, therefore, Parise's informal results suggest that the strength, or consensuality, of the mapping between a wide range of polar attributes/dimensions (see Parise and Spence, 2013;Smith and Sera, 1992) varies from chance-level responding (i.e., ca 50%) up to nearperfect agreement on the most appropriate polarity alignment (see Fig. 2). That said, it should be noted that only a subset of the correspondences assessed by Parise are actually cross-modal.

On Different Kinds of Individual Difference Relevant to the Cross-Modal Correspondences
It is important to recognize how individual differences in the cross-modal correspondences might reveal themselves in a number of different ways: for Figure 1. Comparison of the effect size (d score) for vision and audition between the five Implicit Association Test (IAT) experiments assessing various different audiovisual cross-modal correspondences reported by as Parise and Spence (2012). The participants had to respond to a random sequence of auditory and visual stimuli. When the corresponding stimuli were associated with the same response key performance was better (i.e., faster and more accurate) than when non-corresponding stimuli were associated with each of the two response keys. Note that all of the cross-modal correspondences had a very similar effect size. Error bars represent the standard error of the mean. Figure  instance, as a difference in the absolute number of cross-modal correspondences that are exhibited (i.e., consider here only how while adults have been reported to exhibit a robust cross-modal correspondence between visual size and auditory loudness, younger children of 2-3 years of age apparently do not; Smith and Sera, 1992). More precisely, this claim might be taken to suggest that the strength of different cross-modal mappings varies from chance-level responding to near-perfect agreement across a sampled population (e.g., Dailey et al., 1997;Parise and Spence, 2013). It is, though, important to highlight the fact that 'strength' might have one of several different meanings here: it might, for example, be taken to refer to an individual's own belief in the appropriateness, or feeling of rightness, of the cross-modal mapping. Alternatively, however, it might also be taken to refer to the degree on consensuality across a population (Koriat, 1975(Koriat, , 2008. The latter is presumably what Rader and Tellegen (1987) had in mind when they referred to the 'collective norm' response (cf. Gardner, 1974). Alternatively, however, strength might instead be taken to refer to a variation in the vividness of the association (Rader and Tellegen, 1987), the latter possibly linked to individual differences in the vividness of mental imagery (cf. Nanay, 2017Nanay, , 2018Spence and Deroy, 2013a). This would seem to be what Rader and Tellegen described as the 'experiential component' of synaesthesia. Considering how synaesthesia has so often been defined (e.g., with synaesthetes being required to pass the 'Test of Consistency'; Baron-Cohen et al., 1993), one might also wonder whether the consistency of people's cross-modal mappings over time should be included here (cf. Sabaneev and Pring, 1929, p. 266). That said, while a few researchers have explicitly tested for such consistency (e.g., over a period of years in Gilbert et al., 1996; and over a matter of weeks in O' Mahony, 1983), this is by no means a common practice currently.
In their study, Rader and Tellegen (1987) had 374 participants (undergraduates enrolled in an introductory psychology course at a North American university) match each of three tones (200 Hz, 1000 Hz, and 4000 Hz; as well as a range of other musical and vocal speech stimuli) with each of 11 colours (white, pink, yellow, red, orange, green, blue, purple, brown, grey, and black) identified verbally. The participants were given five points to distribute between the colours as they saw fit in each matching task. White was most frequently matched with the 4000 Hz tone, blue (closely followed by green) with the 1000 Hz tone, and black (closely followed by brown and grey) was the most popular choice for the 200 Hz tone, suggesting a pitchbrightness/lightness mapping. The participants were also given a short questionnaire in order to assess the frequency of any visual-auditory synaesthetic experiences that they had outside of the experimental setting. While the terminology used by Rader and Tellegen is somewhat confusing, they do report "generalized and continuously distributed individual differences in the tendency to represent sounds as colors in accordance with empirically derived norms" (Rader and Tellegen, 1987, p. 981; see also Odbert et al., 1942), the latter referring to what they term a general translation tendency, or variation in the strength of a correspondence. Of course, any correspondences that involve auditory pitch as one of the corresponding dimensions might also be influenced by individual differences in holistic vs spectral listening (Schneider et al., 2005).
Indeed, over the years, a number of studies that have attempted to address individual differences in the strength of cross-modal correspondences have been confusingly described in terms of synaesthetic tendencies (e.g., Rader and Tellegen, 1987;Simpson et al., 1956;Walker and Smith, 1984). It should, however, be noted here that there is also a separate question (and experimental literature) concerning whether there may be a meaningful continuum in terms of any individual differences in synaesthetic tendencies (e.g., Spence, 2016a, 2017;Lehman, 1972;Wicker, 1968;Wicker and Holahan, 1978;cf. Rader and Tellegen, 1987). However, once cross-modal correspondences are clearly distinguished from the phenomenon of synaesthesia (as argued for, at length, by Deroy and Spence, 2013a, b), the answer to the continuity question can, and presumably should, be addressed separately in the two cases (see also Cohen, 2017;Deroy and Spence, 2017).
Taken together, therefore, the research that has been reported in this section highlights how any individual differences in the strength of cross-modal correspondences may be exhibited in one (or more) of several different ways. Namely, the strength of a particular cross-modal correspondence may be related to the general agreement within a tested population, what Rader and Tellegen (1987) refer to as the "general translation tendency" and which may vary between chance-level mapping and perfect agreement (sometimes referred to as "empirically-derived norms" or the "modal response"; Gardner, 1974). Second, researchers have occasionally considered the strength of a correspondence in terms of the vividness of the "experiential component" (Dailey et al., 1997;Rader and Tellegen, 1987). Third, the strength of cross-modal correspondences has also been conceptualized by researchers in terms of an individual's belief, or confidence, that other people will share the same mapping (Koriat, 2008;Wan et al., 2014;Woods et al., 2013). This is sometimes referred to as metacognitive awareness (Chen et al., 2019). Fourth, one could potentially also consider the consistency of a particular cross-modal mapping over time to provide another measure of the strength of a given cross-modal correspondence (Gilbert et al., 1996;O'Mahony, 1983). Finally, one might consider whether the automaticity of a cross-modal correspondence (Fumarola et al., 2014;Getz and Kubovy, 2018;Spence and Deroy, 2013b, or its resistance to interference or perceptual load manipulations; see Evans, 2020) might provide another means of quantifying the strength of a correspondence. As we will see below however, the majority of the literature on individual differences in relation to the cross-modal correspondences that has been published to date has tended to focus on the first of these measures, namely on differences in the general translation tendency as a function of either normal or atypical development and as a result of cultural factors.

Differences Between Groups of Individuals in Cross-Modal Correspondences
The last few years has seen an explosive growth of interest in any betweengroups differences in people's sensitivity to a wide range of cross-modal correspondences. One of the first studies to have assessed such differences (i.e., without confounding it with synaesthesia) was reported by Walker (1987). The latter researcher conducted a study in which 838 people were invited to pick the most appropriate (metaphorically speaking) of four pairs of visual stimuli for a range of pairs of auditory stimuli. Walker's results revealed that the consistency of the choices for four cross-modal correspondences/metaphorical cross-modal mappings (namely, auditory frequency with vertical placement, loudness with visual size, duration with length of horizontal line, and auditory waveform with visual pattern) were determined primarily by the participants' level of musical training, thus suggesting a cultural influence over the crossmodal correspondences that were exhibited (cf. Weinberger et al., 2022).

Cross-Cultural Similarities and Differences in the Cross-Modal Correspondences
As has been mentioned already, a growing number of cross-cultural studies now highlight the existence of both similarities as well as some intriguing cross-cultural differences in terms of the cross-modal correspondences that have been documented to date. Early research has, for example, demonstrated the cross-cultural, and cross-linguistic, generalizability of a number of sound-symbolic correspondences (Köhler, 1929(Köhler, , 1947Sapir, 1929; see also Chen et al., 2016a) (Note 3). Similarly, Bremner et al. (2013) documented a robust bouba-kiki effect amongst the Himba tribe of Kaokoland in Northern Namibia. This remote group of hunter-gatherers, without either schools or written language, made similar shape-sound matches, but different shape-taste matches, to what had been documented in Western consumers previously. In particular, Western consumers have often been shown to exhibit a robust cross-modal correspondence between angularity and carbonation levels in water samples, as well as between increasing bitterness and angularity in chocolate samples varying in terms of their cocoa content (cf. Ngo et al., 2011Ngo et al., , 2012. By contrast, the Himba exhibited no such correspondence between angularity and carbonation while choosing to associate bitterness with roundness more than with angularity (i.e., the opposite pattern of results to that documented in Western participants). As yet, there is no obvious explanation for this striking difference [though it may argue against the hedonic or emotional-mediation account of cross-modal correspondences, given the suggestion that sweet is universally and innately pleasant sensation (Drewnowski and Greenwood, 1983), whereas bitter is unpleasant, and potentially poisonous (see Spence and Deroy, 2013c, for a review)].
It is perhaps worth stressing here the fact that while the cross-modal correspondences that have been documented between colour and taste/odour are likely based on the internalization of the relevant statistics of the environment (i.e., of the natural environment, see Foroni et al., 2016; or supermarket), drinks do not have a distinctive shape Deroy, 2012b, 2013c). As such, the wide range of shape-taste cross-modal correspondences that have been documented (e.g., see Cytowic and Wood, 1982;Deroy and Valentin, 2011) may be based on matching the temporal qualities of tasting (meaning the way in which sweet tastes onset and offset gradually in experience, whereas sour tastes appear to onset and offset far more suddenly; see Obrist et al., 2014). At the same time, however, they may also be hedonically mediated, meaning that pleasant shapes (Larson et al., 2012) and tastes are matched (e.g., sweet and round) while unpleasant stimuli, such as bitter and angular, are also paired (see Velasco et al., 2015). Potentially relevant here, similar crossmodal mappings between angularity and basic taste have now been reported in populations from both China and India (e.g., see Liang et al., 2016).
A subtle modulation of the sound-symbolic bouba-kiki effect has also been reported between Western and Eastern participants -from UK and Taiwan, respectively (see Chen et al., 2016a; and see Blasi et al., 2016Blasi et al., andĆwiek et al., 2021, for the latest findings on sound symbolism). The difference was in terms of the point at which the majority of responses switched from one soundsymbolic response to the other (i.e., from rounded to angular). Crucially, the basic polarity of the cross-modal association was the same in both groups of participants. However, as with a number of other studies that have documented a high degree of concordance in terms of the cross-modal matching in the presence of subtle cultural differences, it is hard to derive any straightforwardly meaningful conclusions regarding the appropriateness of discriminating between the various different classes of correspondence that have been proposed (see Table 1) (Note 4). Meanwhile, Shang and Styles (2017) reported that speakers of different languages match Mandarin Chinese tones of speech sounds/vowels [i] and [u] to the angularity of visual shapes differently. In particular, according to the authors, speakers of Mandarin Chinese classified as Chinese-dominant systematically matched a high, steady tone to the curvy shape and the falling tone to the pointy shape, whereas the English speakers with no knowledge of Chinese matched the high, steady tone to the pointy shape and a low, dipping tone to the curvy shape. The existence of such crosscultural variation is presumably most consistent with a statistical (and possibly also emotional-mediation) account of these particular cross-modal correspondences.
Over the years, a number of striking cross-cultural differences have been reported in terms of olfactory-gustatory cross-modal associations (Blank and Mattes, 1990; see also Spence, 2022). For instance, Blank and Mattes reported how the association of anise and nutmeg aroma/flavour with sweetness (i.e., a basic taste quality) differed somewhat between the groups of white North American and non-white participants whom they happened to test. It has also been suggested that the almond smell of benzaldehyde might be more strongly associated with sweetness amongst Western consumers and with a savoury taste amongst Japanese consumers (see Spence, 2008). In both cases, note, the suggestion is that the particular taste associated with the odorant can be traced back to people's exposure to the co-occurrence in food (i.e., supporting a statistical account of such cross-modal correspondences). This explanation also appeared earlier to help explain the cross-cultural consistencies and differences that have been documented in terms of colour-taste correspondences (e.g., see Velasco et al., 2016). While such cross-modal correspondences are based on cultural statistics, it is worth noting that the ingredients typically combined in recipes differ by region, or culture (e.g., Jain et al., 2015;Rozin, 1983). The specific colours (hues) given to various processed food products is fairly arbitrary -the latter constitute an example of what Walker-Andrews (1994) calls "Arbitrary and Artificial" correspondences. That said, the ripening of many fruits and vegetables tends to be associated with a shift from green towards warmer hues, and that might be expected to be a universal environmental statistic (Foroni et al., 2016).
Taken together, the research that has been reviewed in this section demonstrates, as one might have expected, both broad agreement in terms of the cross-modal correspondences that have been documented between cultures along with some modest cultural differences, especially when it comes to the case of man-made products (e.g., such as food and drink). In this regard, it would be interesting to have more data explicitly comparing those crossmodal correspondences cross-culturally that are hypothesized to be based on "typical intermodal relations" (in Walker-Andrews, 1994, terminology) and/or putatively amodal properties, the prediction being that a high degree of crosscultural agreement would be obtained in the latter case.

On the Development of Cross-Modal Correspondences
According to the developmental literature, very young infants are sensitive to a number of cross-modal correspondences, including those that are soundsymbolic in nature (e.g., Haryu and Kajikawa, 2012;Lewkowicz and Turkewitz, 1980;Maurer et al., 2006;Mondlock and Maurer, 2004;Ozturk et al., 2013;Pejovic and Molnar, 2017;Peña et al., 2011;Walker et al., 2010). The literature that has been published to date shows that many of the correspondences appear to develop in a seemingly predictable sequence over the first decade or so of life -meaning that a sensitivity to additional cross-modal correspondences is acquired (i.e., rather than lost) with increasing age. At the same time, however, it is important to highlight the fact that some inconsistencies have also been reported in the cross-modal correspondences that have been demonstrated at various ages. The latter may, however, in part, simply be explainable in terms of differences in whether explicit or implicit response measures have been used, as well as by various other methodological features, and modality combinations that have been studied by the researchers concerned. (The use of dynamic vs static stimuli might also be relevant here; Jeschonek et al., 2013.) One of the challenges with assessing pre-linguistic infants' abilities has been to avoid falling into the trap of simply assuming that any cross-modal effects that are observed in this group should necessarily be interpreted in terms of synaesthesia (e.g., see Dailey et al., 1997;Deroy and Spence, 2013a, b;Domino, 1989;Maurer, 1997;Rader and Tellegen, 1987;Wagner and Dobkins, 2011). Another challenge has been to distinguish between cross-modal correspondences and other cross-modal mappings that may be more metaphorical in nature (Gardner, 1974;Marks et al., 1987;Wagner et al., 1981;Walker, 1987) (Note 5). Walker et al. (2010) examined preferential looking behaviour in 16 threeto four-month-old preverbal infants. These researchers demonstrated a crossmodal correspondence linking auditory pitch (a sound changing continuously between 300 and 1700 Hz) to oscillating height (i.e., elevation) of a visual stimulus and sharpness of a dynamically-changing visual stimulus (in 12 out of the 16 participants). The magnitude of the looking preference was equivalent for the two stimuli, leading the authors to argue that this reflects both an unlearned aspect of perception and that neonatal perception is fundamentally synaesthetic in nature (Note 6). Meanwhile, research from Haryu and Kajikawa (2012) in Japan suggests that 10-month-old infants (N = 16) are implicitly sensitive to the cross-modal correspondence between auditory pitch and visual brightness (as assessed by a violations of expectation task). However, no cross-modal correspondence was observed between auditory pitch and visual size in this particular age group (N = 16). Consistent with earlier findings reported by Mondloch and Maurer (2004) in a group of 12 30-36month-old infants, all exhibited a preference for high-frequency sounds and brighter objects, while only 80% of the sample were sensitive to the pitchsize correspondence. Such a developmental pattern was taken to support the innate basis of the pitch-brightness correspondence and the statistical account of the pitch-size mapping.
One might have expected that any putatively structural correspondences, such as between auditory loudness and visual brightness, would emerge prior to more arbitrary and artificial statistical correspondences (cf. Loconsole et al., 2022;Walker-Andrews, 1994). According to Marks et al. (1987), certain cross-modal correspondences likely strengthen with age. The latter researchers also suggest that the first cross-modal correspondences to develop are likely to involve prothetic (e.g., magnitude) scaling, and thereafter, the mapping of polar dimensions (Smith and Sera, 1992). There is also a sense that certain correspondences cannot be established until the underlying concept, or perceptual dimension, is understood by the developing child. In particular, Marks et al. tested 500 children aged 3.5-13.5 years of age in response to a range of audiovisual cross-modal correspondences. Importantly, pitchloudness and pitch-brightness cross-modal correspondences were observed by all age groups (including adults). By contrast, the children only became sensitive to the auditory pitch-visual size correspondences when they reached around 11 years of age (see Marks et al., 1987). Marks et al. were led to conclude that the latter correspondence might only be acquired as the result of multisensory experience.
According to Lewkowicz and Turkewitz (1980), the auditory loudness to visual brightness correspondence can be considered as an example of the implicit amodal matching of stimulus intensity. The latter researchers claimed to have shown that at 30 days of age, infants already treated loud sounds and bright lights as cross-modally equivalent. That said, not everyone necessarily agrees on the existence of amodal qualities (see Spence et al., 2013;Spence and Di Stefano, submitted). Whatever the nature of the relationship between loudness and brightness, the developmental research that has been published to date would appear to show that it is one of the first correspondences that very young children exhibit a sensitivity to (see Table 2). Meanwhile, Wagner et al. (1981) found no evidence of preferential looking for visual size and auditory loudness among a sample of 6-14-month-old infants though a number of them showed a sensitivity to broken/continuous line-pulsing/continuous tone, and jagged/smooth circle-pulsing/continuous tone, up/down pointing arrow, and tone ascending or descending in pitch (though the pitch range was not specified; cf. Parise et al., 2014). Intriguingly, these researchers described their findings in terms of metaphoric matches rather than necessarily as cross-modal correspondences. Nava et al. (2016) demonstrated the existence of cross-modal correspondences between increasing/decreasing auditory pitch and the vertical motion (i.e., upwards vs downwards) of a visual stimulus (as in the Barber pole illusion) in 4-5-years-old children. The researchers also documented weak correspondences for the vertical movement of a tactile stimulus delivered to the participant's back (audiotactile condition) and the spatial motion of visual objects to touch (visuotactile condition). According to Nava et al.'s results, pitch-height correspondences for audiovisual and audiotactile combinations are weak in pre-school-aged children. The authors suggested that the limited sensitivity of their young participants to such correspondences may have reflected limited language development and also perhaps the immature development of the auditory system. By contrast, the adults tested in this study exhibited near-perfect agreement in matching the direction of vertical motion with the direction of auditory pitch change, regardless of the pairing of modalities that was assessed, suggesting that, whatever the cause, the strength of cross-modal correspondences varies with age.
Recently, Speed et al. (2021) assessed a relatively wide range of different cross-modal associations with olfactory, auditory, and tactile stimuli in both children and adults. These researchers assessed cross-modal correspondences between auditory pitch, tactile (shape/texture) properties, colours, and a number of odours associated with familiar everyday stimuli (such as the smell of caramel, lemon, menthol, onion, and raspberry). Their results demonstrated little evidence of consensual cross-modal associations in the youngest age group (4-5-years-old) for any of the modality combinations that were tested. At the same time, however, Speed et al. also reported that the correspondences tended to strengthen markedly in the older age groups (6-9 years, 10-18 years, and 19+ years old). The authors argued that this pattern of results was consistent with the importance of exposure to statistical associations in the environment to the establishment of cross-modal correspondences. They also argued that it highlighted the important role played by language and semantics (e.g., in the case of matching colours to odours having a familiar smellssuch as the smell of caramel being associated with the colour brown; see also Metatla et al., 2019;Spence, 2020b). Pitch-brightness: Demonstrated in 10-month-olds by Hidaka and Kajikawa (2018), and shown by all 30-36-month-olds by Mondloch and Maurer (2004; see also Marks et al., 1987) Loudness-brightness: Available at 30 days of age (Lewkowicz and Turkewitz, 1980), though weaker in those with higher Autism Quotient (Hidaka and Yaguchi, 2018) Pitch-loudness: Demonstrated at 3.5 years upwards by Marks et al. (1987) Pitch-visual size: Develops to adult-like levels only at 11 years of age . Only 80% of 30-36-month-olds show this correspondence in Mondloch and Maurer (2004), while Hidaka and Kajikawa (2018) find that 10-month-olds are not sensitive to this correspondence Pitch-shape (angularity): Available very early (in 75% of 3.5-month-olds; Walker et al., 2010) Pitch-elevation: Available very early according to Walker et al. (2010; i.e., in 75% of 3.5-month-olds, though see also Marks et al., 1987;Nava et al., 2016) No evidence for loudness-visual size correspondence in 6-14-month-olds Arbitrary and Artificial Colour-taste and colour-smell correspondences show cross-cultural variation (Levitan et al., 2014;Spence et al., 2015;Velasco et al., 2016;Wan et al., 2014) and consensuality of mapping within culture increases with age (Speed et al., 2021)

Metaphorical crossmodal mapping
Develops to adult-like levels of performance during childhood (e.g., Wagner et al., 1981); may also be influenced by musical training (Walker, 1987)

2021) Analogical
Individual differences in second-order reasoning between senses linked to general intelligence factor g (Weinberger et al., 2022; see also Jewanski, 2010) It is interesting to note that the emergence of multiple correspondences in the 6-9-years-old age group in Speed et al.'s (2021) study seems consistent with a general inflection point in terms of multisensory development that appears to occur at around nine years of age (e.g., Gori et al., 2008;Jüttner et al., 2006;Vaught et al., 1975). For example, according to research from Gori et al., visuohaptic integration becomes optimal at around age 8-10 years; Meanwhile, according to Vaught et al., haptic-visual performance reaches a steady-state around age nine years. Jüttner et al. report that 9-10-year-olds but not 8-9-year-olds have been reported to show view independence. In future research, it would be especially interesting to determine whether particular classes of cross-modal correspondence are more likely to come on line following this inflection point than others.
In conclusion, the developmental research that has been reviewed in this section would appear to demonstrate something of a predictable developmental trajectory concerning the acquisition of different (kinds of) cross-modal correspondences. While the development of a sensitivity to certain correspondences undoubtedly depends on a child's acquisition of the relevant underlying concept, or perceptual dimension (e.g., 'density', Marks et al., 1987), others may be based on structural (e.g., Lewkowicz and Turkewitz, 1980;Spence, 2011), or physiologically-based (Spence and Di Stefano, 2022), correspondences instead. Often, as has been mentioned already, the suggestion has been made by the researchers concerned that those correspondences evidenced in the first few months of life are unlikely to have been learnt from the statistics of the environment (Lewkowicz and Turkewitz, 1980;Spector and Maurer, 2012). This has led to the suggestion that such correspondences may be innate.
Of course, such claims should not be accepted without a careful evaluation of the supporting evidence. At the same time, however, the gradual development of a sensitivity to other cross-modal correspondences (Speed et al., 2021), as well as the developmental trend documented amongst certain audiovisual size correspondences to develop from a coarse to a more fine-grained mapping during primary-school age (Cuturi et al., 2019), suggests a role for experience and/or sensory/intellectual/language development (see also Goubet et al., 2018) (see Table 2 once again).

Autism Spectrum and Individual Differences in Cross-Modal Correspondences
The research that has been published to date shows that those scoring high on the autism spectrum appear to be less sensitive to certain of the correspondences that have been demonstrated in the normal population, such as, for example, the so-called bouba-kiki effect (e.g., Gold and Segal, 2017;Król and Ferenc, 2020;Stewart et al., 2016). In one of the first studies of its kind to document such individual differences, Oberman and Ramachandran (2008) provided some preliminary evidence for a reduced strength of cross-modal correspondences in a group of 10 individuals with autism spectrum disorder (ASD). In particular, their neurotypical group exhibited 88% consensuality on the bouba-kiki effect, as compared to just 56% for those with ASD (chance being 50%). Subsequently, Occelli et al. (2013) conducted a larger study of the takete-maluma phenomenon in a group with autism spectrum disorders. The latter researchers compared the performance of a group of 37 neurotypical children/adolescents (7-19 years of age) with that of both low-and highfunctioning ASD children (N = 20 and 15, respectively) on matching 14 pairs of meaningless angular vs rounded line drawings (rectilinear vs curvilinear, respectively) with 14 meaningless pairs of phoneme clusters. The results (see Fig. 3) revealed no evidence for a sound-symbolic correspondence in the lowfunctioning autistic group with an intermediate level of performance being documented in the high-functioning group (Note 7).
In another study, Hidaka and Yaguchi (2018) investigated the relationship between autistic traits and cross-modal correspondences in a typically developing adult Japanese population (N = 87). These researchers found that, among three types of cross-modal correspondences tested (namely, brightness-loudness, visual size-pitch, and visual elevation-pitch), the brightness-loudness correspondence was related with total ASD traits and a sub-trait (social skill) when the participants were split into higher and lower ASD groups, respectively. Specifically, those in the latter group exhibited a significantly larger cross-modal congruency effect in the speeded visual classification task, thus suggesting that they experienced more of a correspondence between the component auditory and visual stimuli than did those in the group with higher ASD traits (see Fig. 4). Relevant to the earlier discussion, it is worth noting here that brightness-loudness might reflect a structural (or, according to some, a possibly amodal) correspondence (Lewkowicz and Turkewitz, 1980;Spence, 2011), whereas visual size-pitch and visual elevation-pitch are both often discussed as statistical correspondences based on the internalization of robust environmental regularities (e.g., Bernstein and Edelstein, 1971;Evans and Treisman, 2010;Gallace and Spence, 2006;Parise et al., 2014; see also Stewart et al., 2016).
Those individuals with high autistic traits have recently been reported to exhibit less consensual cross-modal correspondences between visual features and basic tastes (Chen et al., 2021). In particular, the latter researchers conducted an online study in Japan with a convenience sample of 85 adults whom they divided into three groups (with a low, medium, or high Autism Quotient). They assessed colour-taste, shape-taste, and shape-colour correspondences (Note 8). The results revealed that the likelihood of choosing the consensual response was significantly associated with an individual's autistic traits. In particular, those participants in the higher autistic quotient group (according to the tripartite division) tended to choose fewer consensual colour-taste and shape-colour correspondences, while no difference was reported in the case of shape-taste mappings (see Fig. 5). Chen et al. put forward a statistical account of their findings. Furthermore, it is perhaps also worth repeating the point made earlier that flavourful stimuli nearly always have a statistically associated colour, whereas shape (e.g., in the case of a drink) is typically determined by the receptacle in which a drink happens to be presented. Shape, in other words, is not necessarily so closely linked to taste as is colour, and the link may be more emotional in origin.
Taken together, the results reported in this section would appear to suggest that autistic individuals and those normal participants with a high Autism Quotient show weaker cross-modal correspondences (Note 9). That is, perhaps autistic tendencies might interfere with an individual's feeling concerning the consensuality of the mapping/response) The fact that (potentially amodal) loudness-brightness correspondence is impaired in those with autistic tendencies is a little surprising, though, given that this has been postulated to be an amodal correspondence (see Lewkowicz and Turkewitz, 1980; though see Spence and Di Stefano, submitted).

Sensory Loss and Cross-Modal Correspondences
The last few years have seen an explosion of interest concerning the status of sound-symbolic and other cross-modal correspondences in those who are blind (e.g., Bottini et al., 2019;Graven and Desebrock, 2018;Hamilton-Fletcher et al., 2018) (Note 10). For example, Hamilton-Fletcher et al. examined the role of visual experience in the emergence of various audiotactile cross-modal correspondences. Intriguingly, they observed both differences and similarities in terms of the correspondences documented in the blind and sighted groups of participants whom they tested (N = 59 early and late blind, and 63 sighted controls). In particular, the congenitally blind participants who they tested (N = 32) were found to be insensitive to the pitch-shape correspondence that has so often been documented in sighted participants (e.g., Marks, 1987;O'Boyle and Tarte, 1980;Parise and Spence, 2012;Walker, 2012) (Note 11). Meanwhile, the pitch-size, and to a lesser extent pitch-weight, cross-modal correspondences were unaffected by an individual's prior visual experience (namely normally sighted, early or late blind, N = 27). Furthermore, and somewhat surprisingly, the pitch-texture and pitch-softness cross-modal correspondences were only documented in the early-blind participants. Intriguingly, mixed data have been published concerning pitch-elevation cross-modal correspondences in sighted, late, and congenitally blind participants (e.g., see Deroy et al., 2016;Eitan et al., 2012;Occelli et al., 2009). For instance, Deroy et al. failed to find any evidence for increasing pitch being associated with tactile motion along the finger in either early-or late-blind participants ( 3 or >5 years, respectively) using an IAT, while demonstrating an effect in the sighted participants. By contrast, Eitan et al. reported that the majority of cross-modal mappings involving musical auditory stimuli were preserved in the blind participants whom they tested. Elsewhere, Barilari et al. (2018) conducted a study on a group of 46 early-blind individuals (who had all become totally blind before the age of three years) and a control group of 46 normally-sighted individuals. Their results showed that the early blind also thought that red is heavier than yellow, though the strength of this particular cross-modal correspondence was somewhat weaker than that observed in the sighted control participants.
According to research from Sourav et al. (2019), there may be a protracted sensitive period regulating the development of cross-modal soundshape associations in humans. The latter researchers investigated the role of visual experience in the development of both audiohaptic and audiovisual (where possible) cross-modal correspondences using five groups of participants. The typically sighted (N = 70), and the late permanent blind groups of participants (N = 12), showed highly robust cross-modal correspondences between sound (the spoken words 'bouba' and 'kiki') and angular vs rounded felt (or seen) shape (haptics or vision, respectively). Meanwhile, those in the congenital permanent blindness group (N = 15), as well as those individuals who had suffered a transient period of congenital blindness as a result of congenital bilateral dense cataracts before undergoing cataract-reversal surgeries (N = 30), and those individuals with a history of developmental cataracts groups did not (N = 24). Taken together, therefore, these results provide support for the existence of a protracted sensitive (or critical) period during which aberrant vision would appear to prevent the possibility of acquiring soundsymbolic-shape cross-modal correspondences later in life (see also Fryer et al., 2014). However, while the stimuli used by Sourev et al. are described as assessing sound-shape associations, a closer look at two of the pairs of tactile stimuli used suggests that they might equally well be described as differing in terms of their surface texture as much as they do in terms of their shape. It would be interesting to know whether this critical period for visual experience also affects the internalization of other basic perceptual correspondences. Even late permanent blindness does not appear to eliminate cross-modal correspondences. In particular, ontogenetically early visual input (defined here as less than 12 years of age) would appear to drive the acquisition of certain sound-symbolic associations. It would be intriguing to know, therefore, about the role of early vision in the establishment of other cross-modal correspondences, including both structural and ubiquitous statistical associations.

Synaesthetic Tendencies and Sensitivity to Cross-Modal Correspondences
One could imagine that more creative artists/designers might be more sensitive to cross-modal correspondences than those who are less creative. That said, while creativity has often been associated with synaesthesia, it need not be (cf. Weinberger et al., 2022). Indeed, many of the studies would appear to have confounded cross-modal correspondences with synaesthesia. Intriguingly, Dailey et al. (1997) conducted an intriguing study of 52 male undergraduate North American psychology students in which they assessed the strength of what they describe as "synaesthesia-like correspondences" (Dailey et al., 1997, p. 3). A median split on participants' performance on the Remote Associates Test (Mednick and Mednick, 1967) gave rise to two groups, differing in terms of their creativity. All of the participants were then presented with six pure tones (150, 300, 600, 1200, 2400, and4800 Hz), and separately with 12 vowel sounds. They were shown each of six colour patches (blue, purple, green, red, orange, and yellow) and requested to rate the similarity of the auditory and visual stimuli on a 7-point scale (from not at all, 1, to very much, 7).
Although the results for the two groups on the two cross-modal matching tasks looked pretty similar, statistical analysis nevertheless revealed a three-way interaction between Creativity, Pitch (or Vowel sound), and Hue. The authors interpreted this finding as showing that: "[m]ore creative participants differed significantly from less creative participants on their ratings, with more creative participants exhibiting stronger associations between colors and pure tones, vowels, and emotional terms" (Dailey et al., 1997, p. 1). That is, more creative individuals made stronger associations between all low-pitched sounds and shorter-wavelength colours, and between all high-pitched tones and longer-wavelength colours as compared to less creative participants. The individual difference in the strength of associations was specifically in terms of rated similarity. Importantly, however, it should be noted that prototypical colours were used in Dailey et al.'s study (here one might refer again to Table  2), meaning that the relative importance of hue, saturation, and brightness cannot easily be disentangled based on the results of this study.
While the question of whether synaesthetes show enhanced multisensory integration relative to non-synaesthetes, and if so, under what conditions, has received mixed support from researchers (e.g., Brang, Williams and Ramachandran, 2012;Neufeld et al., 2012;Whittingham et al., 2014; see also Mulvenna and Walsh, 2006;Ward et al., 2006), one of the few studies to explicitly assess the strength of cross-modal correspondences was reported by Lacey et al. (2016). According to the latter researchers, those individuals with synaesthesia (N = 17) exhibited stronger sound-symbolic crossmodal correspondences (between auditory pseudo-words and visual shapes).
In particular, these researchers tested the strength of cross-modal correspondences using a simplified version of the IAT (cf. Parise and Spence, 2012). Their results demonstrated that synaesthetes were more sensitive than nonsynaesthetes (N = 18) to sound-symbolic cross-modal correspondence, but there was no difference between the groups in terms of low-level sensory associations between auditory pitch and visual size and auditory pitch and visual elevation.

Experimentally-Induced Correspondences
It is interesting to note how several laboratory studies have presented their participants with statistical correlations between pairs of otherwise unrelated unisensory dimensions, such as, for example, luminance and haptic stiffness (Ernst, 2007; see also Baier et al., 2006;Flanagan et al., 2008;Zangenehpour and Zatorre, 2010, for a few other examples of associative learning of arbitrary combinations of auditory and visual stimuli). For example, Ernst (2007) exposed participants to an arbitrary statistical mapping between the luminance of a visual object and its felt stiffness, a haptically ascertained stimulus property that is not correlated with luminance in the natural environment. The participants were trained with multisensory stimuli where an artificial correlation had been introduced between the stimulus dimensions: For some of the participants, the stiffer the object, the brighter it appeared, while this mapping was reversed for others. Ernst's results highlighted a significant change in participants' discrimination performance when their responses to congruent and incongruent pairs of haptic stimuli were compared before and after training. These changes were attributable to changes in the distribution of the coupling prior. The training took place in a single testing session lasting for between 1.5 and 2.5 hours. Presumably the coupling priors for those stimulus dimensions that have been correlated over the course of a person's lifetime are likely to be correspondingly higher.
Given that relatively short associative training regimes have been shown to bias multisensory integration across a range of behavioural/perceptual tasks (see also Brunel et al., 2015;Connolly, 2014;Tong et al., 2021) (Note 12), it would seem that such training protocols may be said to induce differences between individuals in terms of their sensitivity to the specific (i.e., trained) correspondence, which is presumably internalized as a coupling prior in terms of Bayesian decision theory (Parise and Spence, 2013; see also Körding et al., 2007, on the causal inference model of multisensory integration). That said, it is not clear how long such cross-modal correspondences last after the study has been completed. As such, the researchers behind these studies might best be considered to have induced a phasic cross-modal correspondence in contrast to the tonic cross-modal correspondences that have typically been documented elsewhere in the literature (Note 13). Such short-lasting, or temporary, correspondences can presumably also be considered weak. Alternatively, however, one might want to distinguish in some way between such temporarily-induced and other seemingly more long-lasting correspondences in terms of a difference in kind.
In the context of multisensory flavour perception, Stevenson and his colleagues have also conducted a number of studies demonstrating that unfamiliar odorants can take on the taste properties they happen to be associated with (see Stevenson and Boakes, 2004;Stevenson et al., 1998). Importantly, these researchers demonstrated that this can occur after only a surprisingly small number of co-exposures (i.e., no more than three), and even without the participants being necessarily aware of the tastant's presence in the combined stimulus. Taken together, the research that has been reviewed in this subsection clearly demonstrates that new cross-modal correspondences can be acquired rapidly given exposure to the appropriate environmental statistics. However, as yet there is little evidence concerning how long-lasting such correspondences may be (this possibly considered as one element of the strength of a cross-modal correspondence). There may also be fundamental differences in the rate of learning across different senses (with olfactory-taste correspondences seemingly learnt much more rapidly than other modality combinations) that may override the differences that one might want to study in terms of the kind of correspondence (e.g., when comparing statistical with structural).
Nevertheless, it is easy to imagine that it would be much more difficult to change, or update, a structural cross-modal mapping, such as the putative correspondence between stimulus intensity (i.e., if participants were exposed to an artificial environment in which auditory loudness was inversely correlated with visual brightness, say), than it would be to update a statistical correspondence that is based on a weak statistical relationship between two stimulus dimensions. Here, one might consider putatively amodal stimulus dimensions, such as shape, as providing highly correlated visual and tactile information.

Conclusions
As the literature on the cross-modal correspondences continues to flourish (e.g., see Spence, 2011;Spence and Sathian, 2020, for reviews), the interest of several groups of researchers has increasingly started to move beyond merely documenting the existence of an ever-expanding range of (more-orless surprising) correspondences between attributes, features, or dimensions of experience in the different senses. Increasingly, researchers are starting to assess the ubiquity of such cross-modal correspondences across different groups of individuals. This rapidly-emerging body of research has undoubtedly helped to highlight the existence of a number of intriguing developmental (and neuro-atypical) trends in the appearance of various cross-modal correspondences (e.g., Gold et al., 2022;Marks et al., 1987;Nava et al., 2016;Spector and Maurer, 2012;Speed et al., 2021). According to Marks et al., a number of the cross-modal correspondences based on more complex perceptual dimensions (or acquired as a result of experience) simply do not reach adult-like levels of consistency until the second decade of life (see also Cutietta and Haggerty, 1987). The evidence is currently rather mixed though, with the results depending on the method used to study the correspondence, the nature of the correspondence under investigation, and the nature of the sensory loss (e.g., Hamilton-Fletcher et al., 2018;Sourav et al., 2019). In particular, intriguing research from Sourav et al. has convincingly demonstrated the existence of a critical period for visual experience in the establishment of certain sound-symbolic pseudoword-shape correspondences. Autistic tendencies also appear to influence the strength of various cross-modal correspondences (e.g., Chen et al., 2021;Hidaka and Yaguchi, 2018;Oberman and Ramachandran, 2008;Occelli et al., 2013). Individual differences in synaesthetic tendencies (or creativity) and well as musical experience (i.e., a cultural factor) have also been shown to influence the strength of various cross-modal correspondences (e.g., Dailey et al., 1997;Rader and Tellegen, 1987;Walker, 1987; see also Chun and Hupé, 2016;Weinberger et al., 2022).
If one accepts the distinction between multiple different causes (or types) of cross-modal correspondences (see Spence, 2011;Spence and Di Stefano, 2022;Spence and Sathian, 2020; and see Table 1 for a summary), then it is interesting to consider whether certain kinds of cross-modal correspondences (namely, structural or physiological, statistical, semantic or lexical, and emotional or hedonic) are more vs less subject to different kinds of individual difference (see Chen et al., 2021;Hidaka and Yaguchi, 2018;Parise, 2016, on this theme; and see Table 2). As mentioned earlier, it is also important to stress the various ways in which such individual differences might express themselves -namely, as differences in the strength of particular correspondences, differences in vividness or naturalness, and/or differences in whether a particular correspondence is experienced or not. Intriguingly, the evidence is currently rather mixed with regard to the extent to which consensuality of various cross-modal correspondences tracks (or is correlated with) the strength of the correspondence; e.g., see Chen et al., 2016a;Parise, 2016;Wan et al., 2014, on this theme).
At the same time, however, one of the challenges currently is the relatively limited amount of data that have been collected in relation to group-level differences between individuals to date. Moreover, there would appear to be an assumption amongst researchers that various correspondences are universal/amodal and hence the country in which the data is collected is treated as irrelevant (e.g., as when brightness-loudness correspondences are studied in different countries). Another challenge concerns the continuing uncertainty over the very existence of various of the different types of cross-modal correspondence that have been proposed (see Spence, 2011;Spence and Sathian, 2020) (Note 14). Potentially relevant here, the latest evidence would appear to support a potential distinction between sound-symbolic cross-modal correspondences and those that are based on more basic perceptual features (e.g., Bottini et al., 2019;Lacey et al., 2016;Margiotoudi et al., 2022; see also Hung et al., 2017).
Unfortunately, the confusion introduced by a number of earlier researchers around the theme of so-called individual differences in synaesthetic tendencies has not helped to clarify matters in this area (see Dailey et al., 1997;Domino, 1989;Rader and Tellegen, 1987). Nevertheless, as the importance of the cross-modal correspondences to those interested in trying to understand multisensory integration/perception becomes ever clearer (see Miller et al., 1958;Parise and Spence, 2009), it is likely that the various causes, and consequences, of such individual/group differences will likely only grow in relevance. In the years ahead, it is to be hoped that progress will both help to clarify which types of cross-modal correspondence are involved and in which way such group-level differences between individuals express themselves (Parise, 2016).

Notes
1. Chance level responding would be expected to give 1/6, or c. 17% agreement in this case.
2. That said, universal should certainly not be treated as being synonymous with innate.
3. The only exception here is for those languages in which pseudoword phonemes are either absent or else violate phonotactic rules (see Rogers and Ross, 1975;Styles and Gawne, 2017).
4. Though, once again, there is still no plausible explanation for the crosscultural differences that have been reported (e.g., Bremner et al., 2013).
5. Oftentimes, though by no means always, either label has been given to one and the same cross-modal association.
7. Taken together, it is unclear whether the lack of consensual mapping in these autistic individuals is specifically linked to the use of verbal stimuli, given the suggestion of reduced neural activity in Broca's area in autistic individuals.
8. Note that there were 11 colours, 15 shapes, and five basic tastes to choose from.
9. One might only wonder how weak 'theory of mind' (Baron- Cohen, 1995) would influence/impact cross-modal correspondences based on consensuality of the mapping.
10. Cf. Chen et al. (2016b) on colour-shape associations in both deaf and hearing people.
11. And which may also be available in the statistics of the environment according to Walker et al. (2010), given that harder objects tend to shatter into more angular pieces and also make a higher-pitched sound when struck.
12. Note that Hidaka and Yaguchi (2018) also demonstrate that those with a higher Autism-Spectrum Quotient (AQ), specifically attention switching component/trait of ASD, were slower to associatively learn a new cross-modal correspondence between horizontal visual apparent motion direction and pitch pairs (i.e., low-high, or vice versa).
13. That said, no one has yet, at least not as far as I am aware, suggested a 'Test of Consistency' for the cross-modal correspondences equivalent to that used by those researchers interested in synaesthesia (e.g., see Baron-Cohen et al., 1993).
14. See, for example, Sadaghiani et al. (2009) for one attempt to use neuroimaging to try and distinguish between natural, metaphorical, and linguistic correspondences.