Shitsukan — the Multisensory Perception of Quality

We often estimate, or perceive, the quality of materials, surfaces, and objects, what the Japanese refer to as ‘shitsukan’, by means of several of our senses. The majority of the literature on shitsukan perception has, though, tended to focus on the unimodal visual evaluation of stimulus properties. In part, this presumably reﬂects the widespread hegemony of the visual in the modern era and, in part, is a result of the growing interest, not to mention the impressive advances, in digital rendering amongst the computer graphics community. Nevertheless, regardless of such an oculocentric bias in so much of the empirical literature, it is important to note that several other senses often do contribute to the impression of the material quality of surfaces, materials, and objects as experienced in the real world, rather than just in virtual reality. Understanding the multisensory contributions to the perception of material quality, especially when combined with computational and neural data, is likely to have implications for a number of ﬁelds of basic research as well as being applicable to emerging domains such as, for example, multisensory augmented retail, not to mention multisensory packaging design.

both behavioural data and the recently emerging body of neuroimaging literature that has attempted to determine the neural substrates responsible for the perception of various aspects of material quality, such as visual glossiness (Hunter, 1975; see Chadwick and Kentridge, 2015, for a review), translucency (Chadwick et al., 2017), roughness , and/or iridescence (Sharan et al., 2014; see Anderson, 2011, andFleming, 2014, for reviews). There is also interest in the visual perception of different classes of materials such as woods, plastics, pearls, etc. (e.g., Tani et al., 2014).
The majority (but by no means all) of the references cited by Komatsu and Goda (2018) deal only with unimodal studies of visual shitsukan at either the behavioural/psychophysical and/or neural levels. While such a visual bias is, of course, widespread across many areas of science (see Hutmacher, 2019, for a recent review; and Fraser, 1892, for a much earlier commentary), the very ubiquity of this visual hegemony (see Levin, 1993;Mirzoeff, 1999) certainly does not mean that it should go unchallenged. This visual bias in shitsukan research is presumably driven, at least in part, by the growing interest, not to mention the impressive advances in computer graphics and the visual rendering of complex material properties such as glossiness and iridescence. As such, Komatsu and Goda's (2018) review would implicitly seem to be targeted at the computer graphics and computational vision communities. By contrast, the present review is directed more toward those working at the interface of augmented retail and other more product-related applications, such as, for example, those working on the development of multisensory product packaging.
While there has undoubtedly been some progress in the field of haptic rendering (e.g., see Lin and Otaduy's, 2008, volume on this theme; see also Bicchi et al., 2008;Salisbury et al., 1995) (see Note 1), the field of haptic or, for that matter, auditory (see Aramaki et al., 2011;Klatzky et al., 2000;Serafin et al., 2011) digital rendering has not advanced anything like as much as might have been hoped, even over the last decade (e.g., Jones and Ho, 2008; see Spence, 2014, or Parisi, 2018, for reviews of the tactile/haptic domain). There are a number of reasons for this asymmetry. In part, it presumably relates to limitations in the available technology for tactile/haptic stimulation. However, it is also worth noting that the estimated bandwidths of the visual and tactile modalities are radically different (see Table 1 for a summary). What is also unique about haptic rendering is that it is bidirectional and, as such, the bandwidth of the feedback loop is critical to the fidelity of what is rendered (e.g., Salisbury et al., 2004). One might also consider here how much commercial interests, and their associated investments, play into this space. That said, though, it is worth noting that touch-screen technology has rapidly been incorporated into everyday devices in recent years (at least three billion touchenabled devices worldwide by 2015 according to Immersion Technology; see https://www.immersion.com/3-billion-devices-have-touch/).  Zimmerman, 1989), the putative percentage of attentional capture (from Heilig, 1992) and the percentage of neocortex (Felleman and Van Essen, 1991) relative to each sensory modality (though see Hsiao, 1998, for similarities between vision and touch). (Reprinted from Gallace et al., 2012 Vision 2*10 8 2*10 6 10 7 40 70% 55% Audition 3*10 4 2*10 4 10 5 30 20% 3.4% Touch 10 7 10 6 10 6 5 4% 11.5% Taste 3*10 7 10 3 10 3 1(?) 1% 0.5% Smell 7*10 7 10 5 10 5 1(?) 5% n.a.
There are currently a number of intriguing possibilities as far as augmented retail applications are concerned, where stimulating more than just the consumer's eyes holds the promise of increasingly engaging multisensory applications (e.g., see Heller et al., 2019;Leswing, 2016;Overmars and Poels, 2015;Xiao et al., 2016;see Petit et al., 2019, for a review). To be absolutely clear, the primary audience for this particular review is those working (or interested) in applied multisensory domains, including those considering how best to render material perception, as well as the rapidly-emerging field of multisensory packaging design (see Velasco and Spence, 2019, for reviews). Neuroimaging studies of multisensory shitsukan will not be covered in any detail here, in part because there is little that specifically relates to multisensory shitsukan perception, and in part because the unisensory, primarily visual, literature has been summarized so thoroughly by Komatsu and Goda (2018).

Non-Visual Contributions to Shitsukan
It can be argued that what is missing from Komatsu and Goda's (2018) otherwise excellent review is a full discussion of the role played by the non-visual senses in the perception of material quality (cf. Fujisaki, 2020). At best, what one gets is an acknowledgement that certain material properties can only be assessed, or at least are much easier to assess, via another sense, such as touch (be it passive or active, the latter, note, commonly if not universally referred to as haptics). In particular, attributes such as weight, compressability, temperature, and fine (or microgeometric) surface texture are typically easier to ascertain reliably via contact with the skin surface and/or haptic exploration rather than solely by means of visual inspection (e.g., Drucker, 1988;Gallace and Spence, 2014;Guest and Spence, 2003a;Krishna and Morrin, 2008). There has also long been interest in the tactile/haptic discrimination of different materials, and more specifically different qualities of, for example, wool (Binns, 1926) or wood flooring (Berger et al., 2006).
These material properties are often distinguished from an object's geometric properties (shape, curvature, orientation, size and volume) the perception of which is generally based on visual cues (Kahrimanovic et al., 2010). Both vision and touch can be used to determine shape (Lacey and Sathian, 2014;Norman et al., 2004). According to the neuroimaging research, there would appear to be parallel pathways in the brain for the processing of surface properties and the form of objects (Cant and Goodale, 2007;Sathian et al., 2011).
In their review, Komatsu and Goda highlight the growing body of evidence suggesting that certain material qualities may be represented in the same brain regions, regardless of the sensory modality through which those qualities happen to be perceived (e.g., Eck et al., 2013;Goda et al., 2016; see also Whitsel et al., 1989). Similarly, the brain areas involved in determining various object properties, or a person's evaluation of those properties (e.g., in terms of aesthetic appreciation), may well turn out to be shared between the senses (e.g., Brown et al., 2011; see also Schifferstein and Hekkert, 2011). A similar claim can also be made with regards to sensory attributes such as surface texture and form (Goda et al., 2016;Pérez-Bellido et al., 2018;Podrebarac et al., 2014; see also Eck et al., 2013;Sun et al., 2016;Yau, Hollins and Bensmaia, 2009).

Crossmodal Influences on Judgements of Visual Quality
Beyond conveying information concerning those physical attributes of a stimulus that cannot be ascertained visually, it is important to note that even what we consider to be visual judgements, or rather judgements of visuallydetermined material properties, are often influenced by whatever other sensory inputs are present at around the same time, no matter whether we realize it or not (e.g., Adams et al., 2016;Hagtvedt and Brasel, 2016); and mostly the evidence suggests that we do not (e.g., Laird, 1932;Li et al., 2007). So, for example, it is well known amongst laundry detergent manufacturers (such as Unilever) that delivering 'brilliant whites' is about so much more than simply what the customer sees. Perhaps counterintuitively, adding the right 'clean' fragrance really can help to make one's whites look brighter (see Vickers and Spence, 2007). In fact, there is now a rich body of empirical data demonstrating the modulatory role of scent in our evaluation of the physical attributes (e.g., age, attractiveness, and gender) of other people (e.g., Demattè et al., 2007a;Li et al., 2007;McGlone et al., 2013).
Adding the right scent has also been shown to influence people's ratings of fabric softness and material quality (e.g., Churchill et al., 2009;Demattè et al., 2006;Laird, 1932). In recent years, the presence of congruent versus incongruent scents has also been found to affect people's perception of a variety of material properties as well as their aesthetic response to those materials (e.g., Bone and Jantrania, 1992;Bosmans, 2006;Demattè et al., 2007b;Krishna et al., 2010;Zellner et al., 2008; though see also Schifferstein and Michaut, 2002).
As Komatsu and Goda (2018, p. 330) note: "Every sensory modality is involved in material perception. Not only that, material perception also has crossmodal aspects. For example, when we see a sweater made of fine wool, we can perceive that it will be soft and warm, or we can sense that a metal cup will be cold and hard to the touch." They go on to suggest that: "Some material properties, such as microscale roughness, hardness, coldness, and weight, are nonvisual and cannot be directly sensed visually. Nevertheless, interestingly, humans can accurately estimate such nonvisual properties from the visual appearance of materials, in a way that correlates with those haptically estimated through touching them (Baumgartner et al., 2013)." (Komatsu and Goda, 2018, p. 340; see also Yanagisawa and Takatsuji, 2015).
A growing body of empirical research now shows that the impressions offered by the non-visual senses sometimes also contribute to, or modify, the 'visual' impression of various material qualities (e.g., Hagtvedt and Brasel, 2016;Jansson-Boyd and Marlow, 2007;Laird, 1932;Murakoshi et al., 2013). Oftentimes, though, people do not seem to realize just how much information may be carried by the non-visual (and hence often unattended, or less attended) senses.
Interest in the multisensory assessment of material qualities stretches back to the early days of experimental psychology, as documented in the seminal work of English scientist Henry Binns (e.g., Binns, 1934Binns, , 1937. The latter was charged with assessing the manner in which experts assessed the quality of woollen tops in the mills, a matter at the time that was of great practical (not to mention pecuniary) interest to the mill owners. More than 80 years ago, Binns' research had already demonstrated the importance of both the visual and tactile assessment of quality in expert assessors. In one intriguing recent study on a related theme, Xiao et al. (2016) had their participants match photographs of fabric samples to the feel of physical fabric samples. Removing colour was found to reduce accuracy, especially when the images contained 3-D folds. Overall, images of draped fabrics, which revealed 3-D shape information, resulted in better matching accuracy than those images displaying flattened fabrics instead. Overall, therefore, it would appear that people use chromatic gradients to infer tactile fabric properties, at least if they happen to be available.

Sensory Dominance and Computational Accounts of Multisensory Integration
Importantly, however, the perception of material quality, established on the basis of non-visual cues, may well be overridden by what we see (Fujisaki et al., 2014) -a phenomenon known as 'visual dominance' (Posner et al., 1976). So, for example, according to influential early research, our impression of the size and shape of an object may be completely dominated by vision, often overriding any discrepant tactile/haptic cues that happen to be present (Rock and Harris, 1967;Rock and Victor, 1964; see Spence, 2011a, for a review). Ernst and Banks (2002) brought some much-needed mathematical rigour to the field of sensory (i.e., visual) dominance research by demonstrating that maximum-likelihood estimation provides an excellent quantitative account of the integration of visual and tactile/haptic cues in relation to size/length judgments. In their study, participants had to judge the height of a bar that could be seen and also felt between the thumb and index finger of one hand (delivered virtually by means of two force-feedback devices). Adding noise to the visual signal resulted in the participants increasingly relying on haptic information when making their judgements. According to the maximum-likelihood estimation account of sensory dominance, the human brain combines sensory inputs in a manner that is very close to that of a statistically optimal multisensory integrator. That is, the multisensory integration of disparate unisensory inputs appears to maximally reduce the uncertainty of, or variance associated with, our multisensory estimates of external stimulus qualities (given that all sensory estimates are intrinsically noisy). In fact, the maximum-likelihood account provides a surprisingly good account of the relative contribution of each of the senses to multisensory perception in a variety of different settings, and for a variety of different combinations of stimulus modalities (e.g., Alais and Burr, 2004). Nevertheless, there may still be some residual role for directed attention in explaining the patterns of sensory dominance that are sometimes observed (Meijer et al., 2019).
However, one problem that was not tackled by Ernst and Banks' (2002) seminal work concerns the binding problem, namely the problem of which cues should be integrated, and which should be kept separate. According to subsequent research on causal inference, the unity/segregation decision can be resolved probabilistically by means of the incorporation of a variety of priors (see Körding et al., 2007; though see also Chen and Spence, 2017). There has also been research into the multisensory (especially visual-tactile/haptic) integration of surface roughness (often using sandpaper samples). Indeed, over the years, an extensive body of research has investigated visual-tactile interactions in the perception of roughness (e.g., Guest and Spence, 2003b;Heller, 1982;Jones and O'Neil, 1985;Lederman and Abbott, 1981;Lederman and Klatzky, 2004;Lederman, Thorne and Jones, 1986;Warren and Rossano, 1991;Werner and von Schiller, 1932). Intriguingly, based on the available empirical evidence, it would appear that while tactile cues tend to dominate when the senses are put into conflict as far as microgeometric surface textures are concerned, vision typically dominates for the perception of macrogeometric surface properties (see also Klatzky et al., 1993). This somewhat unusual pattern of dominance presumably reflects the relative precision of the two senses at different scales (Klatzky and Lederman, 2010). Lederman et al. (1986) proposed a weighted averaging model of visuotactile texture perception. The Bayesian account has now been extended to the perception of surface texture (e.g., Yanagisawa and Takatsuji, 2015). However, given that the Bayesian causal inference account of shape/texture perception has been reviewed extensively elsewhere, that will not be covered in any more detail here.

Audiovisual Contributions to Multisensory Shitsukan Perception: Computational Account
While much of the research documenting multisensory contributions to material perception has focused on either visuotactile or audiotactile integration, audiovisual integration is also of interest. It is also perhaps the easiest modality combination to render digitally at the present time (see Fujisaki et al., 2014; see also Etzi et al., 2018;Fenko et al., 2011;Gerdes et al., 2014). In one fascinating study, Fujisaki et al. had 16 participants rate auditory (impact sounds), visual and audiovisual stimuli depicting a variety of impact events. All possible combinations of six visible materials were crossed with eight different possible impact sounds. The researchers wanted to know which sense would dominate in terms of people's perception of the material category when viewing one material combined with the impact sound of another (incongruent) material. The participants had to rate how likely it was that the experimental stimuli that they were presented with were to indicate one of 13 material categories. Participants rated each of the six visual stimuli and each of the eight impact sounds when presented in isolation. They also had to rate all 48 possible audiovisual combinations of the two unisensory stimuli.
The results of this study indicated strong interactions between the senses in terms of participants' material perception. So, for example, when the appearance of glass was paired with an impact sound from a bell pepper, the resulting audiovisual stimulus was rated as seeming like transparent plastic (see Fig. 1). According to Fujisaki et al. (2014, p. 1): "Rating material-category likelihoods follow a multiplicative integration rule in that the categories judged to be likely are consistent with both visual and auditory stimuli. On the other hand, rating-material properties, such as roughness and hardness, follow a weighted average rule. Despite a difference in their integration calculations, both rules can be interpreted as optimal Bayesian integration of independent audiovisual estimations for the two types of material judgment, respectively." Here, though, it is perhaps worth noting that research from elsewhere in the field of multisensory perception has shown that the result when presenting incongruent pairs of multisensory stimuli may itself depend on the context in which they happen to be presented. In particular, Gau and Noppeney (2016) documented a reduced McGurk effect when audiovisual McGurk stimulus pairs were embedded in a context of incongruent audiovisual speech stimuli compared to when they were embedded in a stream of congruent speech stimuli instead. It may therefore be relevant to note that in Fujisaki et al.'s (2014) study, the vast majority of the multisensory material pairs that were presented to participants were, in some sense, incongruent (i.e., originating from pairs of stimuli that did not belong together) meaning perhaps that less crossmodal interaction will be observed.

Crossmodal Correspondences and Multisensory Shitsukan Perception
Consistent with the view that many non-visual associations may be stored as learned associations, or in some cases crossmodal correspondences (see also Fenko et al., 2010b;Spence, 2011b), we internalize the statistics of the environment Peeva et al., 2004;Peters et al., 2015Peters et al., , 2018. Hence, crossmodal relations between material properties will likely be rapidly internalized as coupling priors (Chen and Spence, 2017;Ernst, 2007;Komatsu and Goda, 2018;Spence, 2011b;Yuan et al., 2017).
It should, though, be borne in mind that such crossmodal relations may relate to putatively amodal material properties, such as surface texture or form, or else to what Walker-Andrews (1994) refers to as arbitrary crossmodal relations, such as the ring tone that is associated with your mobile phone, say, or the lemon or pine fragrance added to many cleaning products. There is also an interesting, and perhaps orthogonal question here as to whether the associations (or correspondences) between the senses are all learnt from the statistics of the environment (or the marketplace; e.g., see Ye et al., 2019), and  Fujisaki et al., 2014, Fig. 4.) hence might, in some sense, be considered arbitrary, like the association between packaging attributes and likely product qualities, or push-button sounds and/or whether any have much older precedents in human development (and hence which might be considered by some as putatively innate; see LaBonte, 2009;Meert et al., 2014;Saad and Gill, 2000;Spence, 2011b). One might also want to draw a distinction here between those cases where the different senses are picking up on the same putatively amodal stimulus property (such as visual and tactile/haptic estimates of shape, size, or surface texture), versus on nonredundant stimulus dimensions (as in the case of crossmodally corresponding stimulus dimensions, such as, for example, the corresponding dimensions of auditory pitch and visual size, or elevation; see Deroy et al., 2018; see also Walker et al., 2010Walker et al., , 2017).
The sections that follow selectively review some of the most intriguing evidence concerning the role played by the non-visual senses in shitsukan (Note 2).

Tactile Shitsukan Perception
Over the course of the last century, many researchers have investigated the perception of shitsukan (though the research is often not described as such, outside Japan) by means of unisensory cues presented in one of the non-visual senses. As a matter of fact, the majority of this research has focused on the tactile, or haptic, perception of material quality (e.g., Binns, 1926;Culbert and Stellwagen, 1963;Hollins et al., 1993;Ludden and Van Rompay, 2015;Okamoto et al., 2013;Philippe et al., 2004;Picard et al., 2003;Yoshida, 1968a, b, c). This is what we might call the feel the quality. Much of the unisensory tactile/haptic research has focused on an assessment of the material perception of the quality of fabrics. In terms of the affective response to materials, it turns out that softness is important, especially for those garments worn close to the skin (e.g., Chang et al., 2015;Kergoat et al., 2012; see also Etzi et al., 2014;Teli, 2015). However, there is also a large body of research concerning the tactile/haptic aspects of object recognition/perception (e.g., Karlsson and Velasco, 2007;Klatzky et al., 1985;Lederman, 1982;Sonneveld and Schifferstein, 2008;Spence and Gallace, 2011). There has even been some limited research in the field of tactile aesthetics that may be relevant here ; see also Lindauer, 1986;Lindauer et al., 1986).
Key material dimensions in the tactile/haptic modality include surface roughness, surface texture, compliance, weight and temperature (or thermal diffusivity, Bergmann Tiest and Kappers, 2009; see also Bergmann Tiest and Kappers, 2006;Bhatta et al., 2017;Chen et al., 2009;Goebl et al., 2014) and they are fundamental to haptic perception of object properties (Bergmann Tiest, 2010; see also Fujisaki, 2020). Beyond that, there are also the primarily tactile/haptic properties of liquid materials, namely viscosity and wetness (Bergmann Tiest, 2015). One of the other important judgements that a person can make as far as the material properties of a surface or object is concerned relates to its perceived 'naturalness'. Over the last decade or so, a number of researchers have investigated the relative contribution of different senses to the perception of this material quality (e.g., Binninger, 2017;Labbe et al., 2013;Nikolaidou, 2011;Overvliet and Soto-Faraco, 2011;Overvliet et al., 2016;Whitaker et al., 2008; see also Fujisaki et al., 2015;Kanaya et al., 2016). Here it should be noted that there is a growing interest in how other packaging cues, such as a matte finish, may also be linked to, and hence help to convey product naturalness (e.g., Han, 2018;Marckhgott and Kamleitner, 2019). Understandably, many companies have also been interested in the question of whether it is possible to render/manufacture a natural finish (e.g., to product packaging or building materials).
As far as tactile perception is concerned, the impression of shitsukan may depend on the particular region of the skin surface that is used to evaluate a material (see Ackerley et al., 2012;Etzi et al., 2016). Not only are different sensitivities documented at different skin sites, but some tactile receptors, namely C-tactile afferents, are only found in the hairy skin (e.g., see Löken et al., 2009;McGlone and Spence, 2010). Several researchers have demonstrated that one and the same material may be rated quite differently as a function of the skin surface against which it makes contact (Ackerley et al., 2014;Essick et al., 2010;Etzi et al., 2016). At the same time, however, some surprisingly robust individual differences in the 'Need for Touch' (NFT) have also been identified by researchers (e.g., Peck and Childers, 2003a, b). Peck and Childers (2003a, p. 431) define it as "a preference for the extraction and utilization of information obtained through the haptic system". An individual's NFT is typically assessed by means of their response to a standardized series of statements. Those scoring higher (i.e., agreeing more with the questions) are rated as higher in their autotelic need for touch. The sorts of statements used by Peck and Childers to pick out those high in the autotelic NFT include the following: 'Touching products can be fun'; and 'I find myself touching all kinds of products in stores'. According to Peck and Childers (2008, p. 207): "The instrumental dimension of NFT refers to those aspects of touch that reflect outcome directed touch with a salient purchase goal. . . Autotelic touch involves a consumer seeking fun, sensory stimulation, and enjoyment with no purchase goal necessarily salient".
In the years since the NFT framework was put forward, a number of studies have documented that it provides a useful means of distinguishing meaningfully between different groups of consumers (Krishna and Morrin, 2008; see also Citrin et al., 2003). In practice, differences in the NFT typically mean that people are differentially affected by, and hence seek out, the tactile/haptic qualities of an object or material (Ackerman, 2016;Childers and Peck, 2010;Spence, 2019;Workman, 2010). Although it remains unclear, the suggestion is that these individual differences in the NFT presumably reflect more cognitive (i.e., central, rather than peripheral receptor-based) individual differences. Intriguingly, similar individual differences have not been reported for the other higher 'spatial' senses.

Auditory Shitsukan Perception
A separate body of experimental research has assessed auditory shitsukan perception (e.g., Björk, 1985;Giordano and McAdams, 2006;McDermott and Simoncelli, 2011;Zhang et al., 2017). Auditory cues turn out to play an important role in the assessment of product quality (Björk, 1985), what one might be minded to describe as the sound of quality. In terms of the category of material, Giordano and McAdams assessed people's ability to identify object materials on the basis of impact sounds (that is, when something strikes an object). These researchers demonstrated that people could discriminate almost perfectly between the various material categories on the basis of sound (e.g., steel-glass versus wood-plexiglass). The available functional magnetic resonance imaging (fMRI) research here suggests that a sub-region in the ventro-medial pathway appears to be specialized for the task of auditory material perception (Arnott et al., 2008).
A particularly rich vein of research on auditory assessment of material properties relates specifically to the material qualities of foods that are associated with the sounds resulting from oral mastication (see Spence, 2015;Zampini and Spence, 2004, for a review); think here only of the qualities of crisp, crunchy, crackly, squeaky that are sometimes experienced in food. Indeed, as Fujisaki et al. (2014, p. 1) have noted: "material perception is a critical ability for animals to properly regulate behavioural interactions with surrounding objects (e.g., eating)" (see also Nagano et al., 2014).
In terms of other material properties, research has been conducted to show that while the majority of people do not believe that they could determine whether a liquid is hot or cold simply by listening to the sounds of pouring (cf. Stuckey, 2012), most of us turn out to be significantly better than chance at this task. Under forced-choice conditions, using nothing more than auditory cues, people are able to correctly distinguish the sound of recently boiled water from water that has been removed from the fridge instead (Velasco et al., 2013a, b; see also Wildes and Richards, 1988, on the recovery of material properties from sound). In the case of liquids such as water, viscosity changes due to temperature lead to a distinctive change in pitch (Parthasarathy and Chhapgar, 1955). It would appear, then, that we internalize this, and other, environmental statistics (Peeva et al., 2004), despite the fact that we typically tend to use thermal and/or visual cues to make such an assessment of the temperature of an environmental stimulus (Note 3) (cf. Fenko et al., 2010a;Wastiels et al., 2012).
The sounds of opening and closing product packaging also convey useful information (see Wang and Spence, 2019, for a review). Recently, for example, it has been demonstrated that people tend to rate wine as tasting better (i.e., as being of higher quality) when they hear the sound of a cork-stoppered bottle being opened rather than when they hear a screw-cap bottle being opened instead . In fact, there is a rich literature of research that has investigated the influence of the sounds of packaging opening and usage in the food and beverage category (Spence and Wang, 2015).
The sound of car engines, not to mention car doors, and even the sound of the dashboard when tapped with the knuckles, are important and longstanding areas of psychoacoustics research too (Kanie et al., 1987;Montignies et al., 2010; see Spence and Zampini, 2006, for a review). There is also an intriguing literature on the design of push-button sounds too (Altinsoy, 2020). Academic researchers have worked hard to modify the sounds made by products such as cigarette lighters (Lageat et al., 2003), air-conditioning units (Susini et al., 2004), and vacuum cleaners (Wolkomir, 1996). Even the sound of closure of the mascara, or the distinctive pop of the Snapple bottle has been engineered to sound 'just so' (Byron, 2012), to provide, in other words, the sound of quality (e.g., Ozcan and van Egmond, 2012; see also Avanzini and Crosato, 2006;Kim et al., 2007).

Olfactory Shitsukan Perception
As far as the chemical senses are concerned, researchers have investigated the perception of the olfactory quality of scents/perfumes (e.g., studying the perception of complexity, and aesthetic appreciation; e.g., Rabin, 1988;Schiffmann, 1974). There is also a separate literature on those factors influencing the assessment of the (material) quality of food and drink. The latter, note, often being subsumed within the literature on sensory science/sensory studies, and hence published in journals such as Food Quality & Preference and the Journal of Sensory Studies.

Interim Summary
As the results reviewed in this section demonstrate, shitsukan perception occurs in the tactile, auditory, and olfactory modalities when experienced unimodally (see also Haase and Wiedmann, 2018). Of course, that said, the 'million dollar question' here is whether those assessments of shitsukan, or material quality, that are made under unisensory conditions have any predictive value as far as what people will perceive, or report, under conditions where multiple senses may be used in product evaluation (Ballesteros et al., 2005). Indeed, given that visual dominance is such a ubiquitous feature of our object identification (Posner et al., 1976;Spence, 2011a), the contribution of nonvisual cues to shitsukan perception might well be expected to be less important than might perhaps be suggested on the basis of the unisensory studies that have been reviewed above (though see also Hershberger and Misceo, 1996;Komatsuzaki et al., 2016).

Multisensory Contributions to Shitsukan Perception
While the unisensory assessment of shitsukan undoubtedly constitutes an interesting line of laboratory research, out in the real world, our experience and evaluation of material properties is typically based on multisensory cues. Importantly, a large body of empirical research demonstrates that any productintrinsic (not to mention some product-extrinsic) sensory inputs are combined to deliver multisensory shitsukan perception (see also Qiao et al., 2014;Schütte et al., 2008). Understandably, many researchers have been interested in trying to understand the rules of multisensory integration and how they relate to the perception of shitsukan/material quality (e.g., Fujisaki et al., 2014;Lederman et al., 1986;cf. Ernst and Banks, 2002). As was mentioned earlier, Bayesian causal inference, built on the Maximum-Likelihood Estimation approach (Ernst and Banks, 2002), has seemingly done an excellent job in this regard (e.g., Fujisaki et al., 2014;Peters et al., 2018;Yanagisawa and Takatsuji, 2015).
However, while vision normally dominates, it is important to note that visual cues do not always allow for the accurate prediction of non-visual material properties. As Fujisaki et al. (2014) suggest, vision is more useful for assessing surface properties whereas auditory cues may be more informative concerning the internal properties of a material or object. This was shown by Wastiels et al. (2013) in those participants (studying architecture) in the case of the estimated properties of various building materials. Indeed, while visual cues tend to dominate multisensory material perception, there are also some notable exceptions. So, for instance, Adams et al. (2016) have reported that visual glossiness can be affected by the haptic slipperiness of a surface. Nagai, Matsushima, Koida, Tani, Kitazaki and Nakauchi (2015) have reported that it takes longer for people to process non-visual than visual material properties, though this presumably has a lot to do with differences in the manner of exploration in the different senses (Owens et al., 2016;Sun et al., 2016). Being able to ascertain the felt weight of an object or especially product in the hand is undoubtedly important in terms of shitsukan, given that it turns out to be one of the key factors influencing the perception of quality (e.g., Jostmann et al., 2009;Kampfer et al., 2017;Michel et al., 2015;Schneider et al., 2011;and see Spence and Piqueras-Fiszman, 2011, for a review).

Auditory Contributions to Multisensory Shitsukan Perception
In terms of auditory contributions to multisensory shitsukan, the perception of felt texture is often influenced by modifying any sounds that are elicited by the interaction (e.g., Altinsoy, 2008Altinsoy, , 2020Guest et al., 2002;Jousmäki and Hari, 1998;Suzuki et al., 2006Suzuki et al., , 2008 and again see Werner and von Schiller, 1932, for early work in this area). For instance, several studies of the 'parchment skin' illusion have revealed that people's perception of their own skin can be modified simply by changing the sound of the interaction when they rub their hands together in front of a microphone (see also Senna et al., 2014, on the marble hand illusion). Guest and colleagues adapted this approach in order to demonstrate that it was also possible to modify people's perception of sandpaper samples (i.e., surface properties unrelated to the body). The participants in the latter's studies rubbed a selection of swatches (that were hidden out of sight in a box; see Fig. 2) with a finger by modifying the sounds made by the interaction with the material, building on earlier work by Lederman (1979) on auditory texture perception (see also Lemaitre and Heller, 2012).
Sound, in other words, is an integral part of our interaction with the majority of materials, objects, and products. Adding that sound, or modifying it, has been shown to influence people's perception/behaviour in a diverse range of situations including, in one case, in an augmented reality clothing setting in a fashion store mirror application (Ho et al., 2013). In this study, the sounds of different realistic jacket materials rustling were synchronized with people's movements when standing in front of an augmented mirror that allowed them to try on virtual clothing visually. Meanwhile, in another project, we demonstrated that modification of the sound made when women walked in high heels on an augmented catwalk significantly influenced both their impressions and emotional reactions (Tonetto et al., 2014). In particular, the sound that the female participants heard when either of their heels touched the ground was changed in order to convey different shoe and floor material interactions, some louder than others (see also Bresin et al., 2010;Furfaro et al., 2015;Serafin et al., 2011;Turchet et al., 2010).
In earlier research, Zampini et al. (2003) had modified the sound made by an electric toothbrush (boosting or cutting the high frequency components of the sound of the motor). Once again, this experimental manipulation was found to influence the perception of those who used the product. And staying with the mouth for a moment, sonic cues have also been added to modify the perceived texture of crispy foods (e.g., Masuda and Okajima, 2011;Zampini and Spence, 2004;see Spence, 2015, for a review). Researchers in Japan have even investigated whether the sound of the appropriate food texture can be used to help those elderly individuals forced to eat pureed meals to enjoy their food more (Endo et al., 2016(Endo et al., , 2017Fujisaki, 2020). Finally here, Spence and Zampini (2007) modified the sound made by an aerosol spray deodorant while in use and showed a crossmodal influence on users' perception of the product and its powerfulness.

Olfactory Contributions to Multisensory Shitsukan Perception
As far as olfactory cues to the multisensory perception of material/product quality are concerned, the classic study was published by Laird (1932) almost 90 years ago. Laird reported that women's judgements of the quality of silk stockings depended on the scent with which the stockings had been impregnated. The 250 housewives in Laird's study were shown to prefer stockings with a narcissus scent over those with a 'natural' scent, even though the stockings were otherwise identical. Intriguingly, when asked for the reason behind their preference for one pair of stockings over the others, the majority of those questioned apparently pointed to differences in durability, sheen, or weave (i.e., to differences in the tactile and/or visual material properties), rather than to differences in their olfactory properties. This observation provides an early example suggesting that we are sometimes unaware of the sensory inputs that may actually be driving our perceptual decisions. Demattè et al. (2006) followed up on Laird's (1932) seminal early study, demonstrating that olfactory cues influenced the tactile perception of fabric softness using computer-controlled stimulus presentation (i.e., an eightchannel olfactometer and a fabric carousel). The results revealed that participants rated fabric swatches as feeling significantly softer when presented with a lemon odour than when presented with an animal-like odour instead, thus demonstrating once again that olfactory cues can indeed modulate tactile perception. Meanwhile, Churchill et al. (2009) have reported that the addition of a variety of fragrances modified the perceived textural properties of shampoo and hair.
At the other end of the spectrum in terms of quality perception, it is worth drawing attention to the anecdotal reports hinting at the profound effect played by 'new car smell' in modifying people's perception of a vehicle (Moran 2000a, b, c;Van Lente and Herman, 2001). What is more, such crossmodal influences on the perception of quality do not just affect the consumer's perception of a new vehicle, but also appear to modify people's perception of their own vehicles following servicing. Just take the reports from SC Gordon Ltd, coachbuilders of Rolls-Royce cars, who have developed their own unique new car smell designed specifically to mimic the aromatic blend of leather and wood of a vintage 1965 Silver Cloud model. The 'car cologne' is applied when new cars come in for repair. According to Hugh Hadland, Managing Director of the company, "People say they don't understand what we've done, but that their cars come back different and better" (quote from Spence, 2002). It seems that just one squirt of the luxury perfume is enough to restore that sense of luxury in even the most expensive of consumer purchases. Of course, such an olfactorily-inspired approach can also be used to add value when it comes to reselling a car (Aikman, 1951;Hamilton, 1966;Wright, 1966). And finally here, though beyond the scope of the present article, it should also be mentioned that there is an intriguing literature examining how the introduction/manipulation of olfactory cues can influence people's perception/reception of works of art (e.g., Cirrincione et al., 2014;Pursey and Lomas, 2018).

A Taste of Shitsukan
Research conducted over the last decade or so has demonstrated the important role played by the metal in which a spoon has been coated on gustatory perception (e.g., Laughlin et al., 2009Laughlin et al., , 2011. Indeed, several studies have assessed people's ability to determine the material properties of stimuli when placed in the mouth (e.g., Howes et al., 2014;Jacobs et al., 1998). Importantly, such effects were demonstrated by Piqueras-Fiszman and her colleagues in the absence of any visual input (that is, the participants were blindfolded). In the latter study, participants tasted samples of cream that were slightly sweet, sour, bitter, salty, or plain. The samples were tasted from spoons that were identical in terms of their size and weight, but had been coated in zinc, copper, gold, and stainless steel. The results showed that the taste properties of the creams were influenced by the material properties of the spoons, thus providing an example of gustatory contributions to material perception.
Here, of course, it is interesting to note that any gustatory influence of the spoon identified under conditions of blind tasting in Piqueras-Fiszman et al.'s (2012) study may simply be overridden by any visual associations that a consumer may have when interacting with such cutlery (see Spence and Piqueras-Fiszman, 2014). That is, while one metal might improve the taste of foods (or rather enhance a specific taste attribute, such as salty, bitter or sour) when a taster is denied vision, it is easy to imagine how catching sight of a gold spoon, say, might immediately set specific quality expectations (i.e., affective, and possibly also sensory) and hence dominate, or modify, the ensuing product experience (see Aldersey-Williams, 2011) (Note 4). Note here also the fact that Harrar and Spence (2013) have documented a modest influence of the colour of plastic spoons on the taste of yoghurt too.

Context and Product-Extrinsic Influences
The senses typically provide complementary information concerning the material properties of an object or surface in the natural world. However, that is by no means always the case (see Björkman, 1967;and Stanton and Spence, 2020, for a review). As we have seen already, the senses sometimes access more-or-less modality-specific material properties. According to Fenko et al. (2010b), though, the relative importance of the different senses often changes over the various stages of product interaction/lifespan (e.g., from initial purchase through to eventual disposal). What is more, either deliberately, or accidentally, incongruent sensory cues are occasionally associated with the impressions delivered by the different senses in augmented or virtual reality settings (McGee et al., 2002). Relatively small differences in the impression delivered by the different senses may well not be noticed, in part because they may be eliminated as a result of multisensory integration (i.e., often visual dominance). However, when the difference between the senses becomes too large, the discrepancy may well become apparent to the observer. One of the important unanswered questions in this research area concerns the magnitude of the intersensory discrepancy at which crossmodal binding switches to segregation (see Chen and Spence, 2017) -this is related to the causal inference problem in multisensory perception (Körding et al., 2007).
Sensory incongruency is sometimes deliberately used by designers. Such incongruency may either give rise to hidden or visible novelty (e.g., Ludden and Schifferstein, 2007;Ludden et al., 2009; see also Schifferstein and Spence, 2008). Visible novelty is apparent even without the observer having to interact physically with the object, whereas hidden novelty may only reveal itself when interacting physically with the object. One example of the latter is offered by those vases that look like they are made of cut glass, but which are actually made of plastic, and so are much lighter than expected when they are picked up (see Schifferstein and Spence, 2008). Anecdotally, hidden novelty can lead to memorable material interactions. For instance, I vividly remember the occasion more than a decade ago when picking up what looked like a regular vellum envelope in a Michelin-starred restaurant. I was shocked to discover that it actually had the feel of skin (Note 5). Neuroimaging research suggests that visual-tactile material incongruency gives rise to activation in the precuneus (Kitada et al., 2014). As yet, however, and as noted already, there has been relatively little neuroimaging research specifically on multisensory incongruency in the case of material perception.
In the case of food experience, such surprising experiences of the food/dish are often desirable and, in some cases, even increasingly expected by diners, especially those visiting modernist establishments (e.g., Spence, 2017;Velasco et al., 2016). There is some intriguing research on the material properties of foods that are, for example, associated with freshness (Arce-Lopera et al., 2012;Imura et al., 2016; see also Meert et al., 2014). At the same time, however, there is also an emerging interest in trying to change the material properties of food by means of augmented reality (e.g., Huang et al., 2019;Ueda et al., submitted).

Product-Extrinsic Influences
While much of the research on material perception has tended to focus on the influence of product-intrinsic cues, i.e., cues such as roughness or shape that can be discerned both visually and haptically (e.g., Bergmann Tiest and Kappers, 2007), an emerging body of research has started to highlight the importance of a variety of contextual cues, such as the influence of background music on tactile evaluation. So, for example, one study documented a significant effect of sexy music on ratings of the sexiness of the tactile stimulation delivered by a robotic stroking device (Fritz et al., 2017). Meanwhile, other researchers have recently reported that soft music (defined as slower-tempo, smooth-flowing rhythms, smoothly connected legato-like notes and consonant harmony, string instruments, and less variation in volume) modulates people's ratings of the softness of material (see Imschloss and Kuenhl, 2019) (Note 6).
Of course, at one level, the use of fragrance to modify material perception might also be considered an example of a product-extrinsic cue. Indeed, the influence of fragrance on product perception often appears to occur regardless of whether that fragrance is perceived to have originated from the product or material, or merely to be present as a contextual cue (see Demattè et al., 2006;Spence, 2002). As yet, I am not aware of relevant research that has attempted to assess whether the integration of information presented in different modalities differs between the two cases (i.e., depending on whether the odorant is treated as being product-intrinsic or merely present in the atmosphere) At the same time, however, it would also appear that the consumer's response to deliberate 'designed' intersensory incongruency is also determined, in part, by the context in which it is presented: think high-end design store vs. IKEA, or Michelin-starred modernist restaurant versus canteen (Schifferstein and Spence, 2008; see also Sundar and Noseworthy, 2016a, b). Sundar and Noseworthy have reported on some intriguing work looking at cross-sensory congruency/incongruency as a function of people's perception of the purported company/brand involved. They suggest that different strategies may be more appropriate for different brands/brand personalities. In particular, visualtactile incongruency in packaging design appears to be better received for those brands that are rated as exciting (see also Littel and Orth, 2013). Sundar and Noseworthy conducted field and laboratory studies in which their participants were presented with products packaged in such a way that they either looked and felt textured (crossmodally congruent), or they looked textured but felt like something else (crossmodally incongruent). The results showed that 'sincere' brands (like Hallmark, Ford, Coca-Cola) were preferred when there was crossmodal congruency, while those brands that were judged more exciting (BMW, Pepsi, Mountain Dew) were preferred when there was incongruency between the seen and felt texture of the packaging instead.
Another kind of perceptual switch between mismatching appearance cues occurs when products or surfaces present images on the underlying substrate/surface (e.g., Childs and Henson, 2007;Jansson-Boyd and Marlow, 2007). While the image, in such cases, is normally key, there are occasions where a deliberate attempt is made by the creator to draw the viewer's attention to the tactile/haptic/textural attributes of such surfaces. This is captured in the following quote from Durand (1995, p. 150) who describes how the viewer's attention may shift from the vision impression to touch/haptics while inspecting a photograph, and how the very frame of reference/object of attention changes in the process: "It is interesting also to take into account those points at which perception shifts from one regime to another -for example, how in some photographs attention moves from the thing represented to an awareness of texture, say the grain of the skin or the weave of foliage as they become identified with the photographic texture itself. When that happens, it is a real event, a moment of purely visual thought that takes place -as we shift from a regime of pure opticality to the optical-tactile (or 'optical-haptic' in Alois Riegl's terminology" (Note 7).
Another area of interest in terms of the impact of contextual cues on material perception concerns what happens when people respond to objects as a function of where those objects are handled (e.g., museum or science laboratory), or what people are told about the provenance of what they are handling (rare artefact or cheap replica). While none of the various grant applications that I have been involved in on this very topic were ever funded, I nevertheless still find the question to be both intriguing and important (see Chatterjee, 2008;Pye, 2007). In somewhat conceptually-related research, it has been shown that what people believe about other kinds of tactile stimulation (e.g., whether they believe a skin cream to be cheap or expensive, or who they believe to be stroking their arm) can influence both the behavioural and neural responses to that touch that can be observed (e.g., McCabe et al., 2008).

Conclusions
Despite the evident, and to an extent understandable, focus on the visual as far as research on the theme of shitsukan is concerned (see Komatsu and Goda, 2018, for a recent authoritative review), it is important to remember that our experience of the material qualities that we physically evaluate is typically a multisensory experience (e.g., Schifferstein and Hekkert, 2008;Spence, 2009a). And while visual cues may very well, and very often, dominate the resulting experience and hence judgement (see Posner et al., 1976;Spence, 2011a), that does not mean that the other senses can be ignored, even though they are undoubtedly harder to render digitally, at least given the currently available technology. What is more, as we have seen, there are a number of situations in which one of the non-visual senses dominates, or else modulates multisensory quality perception (see Adams et al., 2016;Fujisaki et al., 2014;Lederman et al., 1986).
Ultimately, it is important to remember that shitsukan is a fundamentally multisensory construct, one that, as we have seen time-and-again throughout this review, can be influenced by what we hear, feel, smell, and even, on occasion, what we taste/feel in the mouth (see Howes et al., 2014). And true to the insights of 'humaneering' (otherwise known as 'consumer engineering'), a movement that emerged in North America after the Great Depression, it is the subtle effects, like the feeling experienced by the hand on coming into contact with the lining of the fur coat, or the arbitrary scent applied to the stockings in Laird's (1932) classic study, that can really make all the difference to our evaluation of products (e.g., Sheldon and Arens, 1932; see also Cox, 1967). At the same time, however, it is also important to bear in mind that the sensory interactions observed when one sense is removed (such as when participants' vision is occluded), may not be the same as those that are observed when all of the consumers' senses are in operation.
Another area of growing interest is to study crossmodal correspondences in relation to material properties, both establishing underlying correspondences, but then also investigating how they influence perception/evaluation (Bliglevens et al., 2009;Decré and Cloonan, 2019;Imschloss and Kuenhl, 2019). The latter would likely seem like an area that will become increasingly important to those working in the area of material perception in the years ahead (see also Spence, in press). Indeed, there is growing interest in trying to convey material properties such as pinching and scrunching fabric via mobile devices (see Cano et al., 2017;Xiao et al., 2016). What is more, an added benefit of enabling tactile interaction via digital channels is the likely increase in perceived ownership that such tactile interactions appear to induce (Brengman et al., 2019; see also Pantoja et al., 2020).
The last few years have seen a rapidly expanding interest in the area of augmented retail, as for, example, in the case of online clothing/shopping applications (e.g., de Vries et al., 2018;Flavian et al., 2019;Heller et al., 2019;Xiao et al., 2016). There is also growing interest in the design of material/object interaction sounds, in everything from push-buttons (Mortensen et al., 2009) to sporting equipment (Roberts et al., 2005; see Stanton and Spence, 2020, for a review). Interest has also grown in the area of multisensory contributions to product packaging (Balaji et al., 2011; see also Chen et al., 2009), and in particular, the learnt associations that may drive product perception/evaluation (De Kerpel et al., 2020;Marckhgott and Kamleitner, 2019). All of these, then, represent promising areas for explicitly multisensory/crossmodal shitsukan research.

A similar theme runs through Lipps and Lupton's (2018) recent volume,
The senses: Design beyond vision. The latter catalogue covers many of the design objects presented at an exhibition held at the Cooper-Hewitt Museum in New York a few years ago illustrating material qualities that go beyond what is ascertainable by means of visual inspection.
3. Wang and Spence (2017) have also shown that temperature can, in some sense at least, be conveyed musically, by basing the compositions on the crossmodal correspondences involving audition (see Wallmark, 2019).
4. Relevant here, the available research also shows how being able to see and feel the wine glass radically modifies/enhances the tasting experience, even though in the absence of such cues, no difference between the experience of wine presented in different wine glasses is typically observed (see Spence, 2011c, for a review).
5. This, I suspect, much better that the sandpaper envelope in which the bill is currently presented in London restaurant Restaurant Story (https://restaurantstory.co.uk/), presumably corresponding to the pain of payment.
6. Such findings can perhaps be seen as building on earlier research where Reinoso Carvalho demonstrated that the creaminess of a luxury chocolate could also be modified by playing one of two pieces of music (Reinoso Carvalho et al., 2017).