Abstract
Social interactions often require the simultaneous processing of emotions from facial expressions and speech. However, the development of the gaze behavior used for emotion recognition, and the effects of speech perception on the visual encoding of facial expressions is less understood. We therefore conducted a word-primed face categorization experiment, where participants from multiple age groups (six-year-olds, 12-year-olds, and adults) categorized target facial expressions as positive or negative after priming with valence-congruent or -incongruent auditory emotion words, or no words at all. We recorded our participants’ gaze behavior during this task using an eye-tracker, and analyzed the data with respect to the fixation time toward the eyes and mouth regions of faces, as well as the time until participants made the first fixation within those regions (time to first fixation, TTFF). We found that the six-year-olds showed significantly higher accuracy in categorizing congruently primed faces compared to the other conditions. The six-year-olds also showed faster response times, shorter total fixation durations, and faster TTFF measures in all primed trials, regardless of congruency, as compared to unprimed trials. We also found that while adults looked first, and longer, at the eyes as compared to the mouth regions of target faces, children did not exhibit this gaze behavior. Our results thus indicate that young children are more sensitive than adults or older children to auditory emotion word primes during the perception of emotional faces, and that the distribution of gaze across the regions of the face changes significantly from childhood to adulthood.
1. Introduction
As children grow up and begin to attend school they take part in increasingly complex social interactions. During this time period, from approximately six years of age and through to adolescence, children dramatically improve their ability to accurately perceive emotional facial expressions. More specifically, studies show that while children are generally able to accurately perceive happy faces from a young age, the perception of negative facial expressions such as anger, fear, and sadness develops more slowly (De Sonneville et al., 2002; Gao and Maurer, 2010; Mancini et al., 2013; Rodger et al., 2015; Vesker, Bahn, Degé, et al., 2018a). A key mechanism involved in these changes could be the gaze behavior used by children to encode emotional faces. As will be described in greater detail below, previous studies have shown that the regions of the face differ in terms of their relevance for the perception of various facial expressions. Furthermore, developmental changes in the deployment of gaze during emotion identification tasks by children (Pollux et al., 2014) strongly suggest that gaze behavior is crucial to our understanding of the development of emotional face perception.
One sometimes overlooked aspect of the development of children’s ability to process facial expressions is the possible influence of speech perception, which also undergoes significant development from the age of approximately six years (Bahn et al., 2017; Kauschke et al., 2017). In addition to this overlap between the development of speech perception and the perception of facial expressions in terms of the timeframe, these two modalities also exhibit a high degree of ecological co-occurrence. Speech and face perception are tasks which must frequently take place simultaneously during social interaction, and the ability to effectively integrate information from both of these sources is crucial for accurate and rapid evaluation of social dynamics and planning one’s own corresponding actions (Campanella and Belin, 2007; Paulmann and Pell, 2011). Evidence from previous studies, which will be further described below, indicates that the perception of faces can be significantly influenced by both verbal and non-verbal auditory input.
Based on these findings, the present study was motivated by questions of whether and how the deployment of participants’ gaze during the perception of emotional facial expressions might be influenced by auditory emotional information, and how this relationship might change over the course of development.
1.1. Gaze Distribution for Emotional Facial Expressions
A number of studies have shown that the features of the human face have differing levels of importance for the evaluation of various emotional facial expressions. Both Eisenbarth and Alpers (2011) and Schurgin et al. (2014) found that the eye region tended to be more relevant when looking at sad faces, while the mouth region was more relevant when participants looked at happy or joyful faces. Scheller et al. (2012) similarly found that when participants classified emotional faces, they looked longer at the mouths of happy faces, and longer at the eyes of fearful and neutral faces. Calvo et al. (2018) found that when participants were asked to identify videos clips of dynamic expressions, they showed the longest gaze time proportion for the eye region of negative expressions such as sadness, anger, fear, and disgust compared to happiness. By contrast, participants showed the longest gaze time proportion for the mouth region of happy expressions compared to the negative expressions listed above. Meanwhile, Bodenschatz et al. (2019) found that even when emotional facial expressions were only briefly presented as masked primes, they could induce participants to orient their gaze in a particular fashion when viewing a subsequent neutral face target. For instance, fearful primes elicited longer dwell times on the eye region of the neutral face target, while happy primes elicited longer dwell times on the mouth region of the neutral face targets. Thus, considering all the above findings, it appears that the eye region tends to be more important for the perception of negative expressions, while the mouth region appears to be more important for the perception of positive facial expressions.
From a developmental perspective, Pollux et al. (2014) demonstrated that compared to children, adults fixate longer in the eye region of faces when identifying sad, happy, and fearful emotional expressions, while children fixate longer on the mouth than adults. However, with practice children increase their fixation time toward the eyes of fearful and sad faces, demonstrating a more adult-like looking pattern for these faces. Meanwhile, Nakano et al. (2010), showed that while younger children initially prefer to look at the mouths of speaking faces in videos, this preference reversed into a preference for looking at the eyes with increasing age.
1.2. The Effects of Auditory Information on Gaze and Emotion Perception
As mentioned earlier, we are primarily interested in the role that auditory information might play in the observable gaze patterns during emotional categorization, since emotion perception often involves the evaluation of both speech and facial expressions during social interactions. A number of studies on speech comprehension have demonstrated that although adult observers typically demonstrate more gaze allocation towards the eye region of faces, the addition of audible speech produces more fixations toward the speaker’s mouth (Lansing and McConkie, 2003; Vatikiotis-Bateson et al., 1998). Gaze allocation towards the mouth in speech-comprehension tasks was even higher in more difficult conditions where audible noise was added to the speech audio (Vatikiotis-Bateson et al., 1998), or when video stimuli of speaking faces lacked audio entirely (Lansing and McConkie, 2003). Moreover, the increase in gaze allocation towards the mouth region of speakers seems to occur even when the viewing task requires no explicit speech comprehension, such as when viewers were asked to merely rate the degree to which they liked the video clip (Vo et al., 2012). However, it must be noted that when observers are aware that the speech component of audiovisual stimuli such as speaking faces is not meaningful, such as in a recent study by de Boer et al. (2020), they seem to actively shift their visual attention away from the mouth. Participants in this study were shown videos of people speaking meaningless word-like speech, and the researchers observed less visual attention for the mouth than in a condition without the audio, where observers were unaware of the meaninglessness of the ‘speech’.
With regard to the effects of auditory information on the perception of facial expressions, several studies in adults have already shown auditory stimuli to be capable of influencing the perception of emotional facial expressions. For instance, Carroll and Young (2005) found that facial expressions were identified faster and more accurately when they were primed by non-verbal emotional sounds (e.g., growling, weeping, laughter, etc.) that were associated with the same emotion as the target face. Collignon et al. (2008) likewise found that videos of fearful and disgusted facial expressions were easier to classify when they were accompanied by non-verbal vocalizations of the same emotion. Similarly, Pell (2005), found that facial expressions were easier to recognize when they were primed by a nonsense sentence spoken with a prosody of the same emotion category as the facial expression than when the prosody and the facial expression were mismatched. More recently, Filippi et al. (2017) demonstrated that participants were able to identify the emotions of facial expressions faster when they were accompanied by words matching the emotion of the face in meaning and prosody relative to conditions where the emotion of the face and either the word meaning or prosody were mismatched. In developmental studies, Vesker et al. (2018c) found that when participants were asked to categorize facial expressions as positive or negative, both children and adults made more errors in categorizing positive faces when they were primed with auditory negative emotion words.
Therefore, it appears that emotional verbal stimuli can be effective as auditory primes in eliciting effects on the subsequent processing, and perhaps on the gaze patterns used for the visual encoding of facial expressions. However, to our knowledge, there have been no studies that investigated possible developmental changes in the effects of emotion words on the gaze patterns used for the perception of facial expressions.
1.3. The Present Study
Thus, the present study was designed to investigate the developmental changes in the role of gaze patterns in the perception of emotional facial expressions, and more specifically how such changes and patterns might be affected by auditory emotion words. Gaining additional insights into these relationships is crucial for improving our understanding of how children develop their ability to effectively interpret socially relevant information from facial expressions and speech.
For our study, we chose to use an emotional face categorization task (into positive or negative valence categories) rather than an emotional identification task like studies such as that by Pollux et al. (2014) as described above. This choice was made based on previous findings that typical emotional identification tasks may inadvertently bias results when comparing outcomes for positive and negative facial expressions, since the positive valence category is populated by fewer of the primary facial expressions than the negative valence category, and negative facial expressions may thus be fundamentally more difficult to accurately identify than positive facial expressions (Kauschke et al., 2019; Nummenmaa and Calvo, 2015; Xu et al., 2019). Additionally, thanks to its simplicity such a task should be equally easy to understand and carry out for participants from a broad variety of age groups, which is particularly advantageous for us as we intended to study children as young as six years old.
In order to test for the influence of auditory emotion words on gaze behavior, the face targets in our categorization task were either primed with positive or negative emotion words, or unprimed. In the primed trials, the target facial expressions for categorization could be either congruent or incongruent in terms of emotion valence with the emotion word primes. Thus, positive word primes, negative word primes, or no primes at all would precede the appearance of positive or negative target facial expressions, which participants were required to categorize as positive or negative as quickly as possible. We were interested in analyzing not only how the participants’ overall gaze distribution might change depending on the word primes, but also in the order in which participants examine the regions of the face. Our groups of participants included both children and adults in order to observe how such effects might change over the course of development.
2. Methods
2.1. Participants
We tested participants from three age groups: six-year-old children, 12-year-old children, and adults. Based on sample sizes in previous studies that used similar categorization tasks (Bahn et al., 2017; Vesker et al., 2018a; Vesker et al., 2018c), we aimed to test 20 participants per age group. In total, we tested 24 six-year-old children (13 female, 11 male), 22 12-year-olds (12 female, 10 male), and 22 adults (13 female, nine male, age range: 19–33 years, mean age = 24.5 years, SD = 3.7 years). From that total sample, we excluded two six-year-old children due to technical problems during testing. Additionally, we excluded two six-year-olds and one 12-year-old due to an insufficient quality of the eye-tracking data (less than 40% of data points successfully recorded at sampling frequency of 300 Hz). Finally, two more children (one six-year-old, and one 12-year-old) were excluded for failing the WWT 6–10 vocabulary test (see below), and one additional six-year-old was excluded due to an overall accuracy rate of below 60% in the categorization task.
After the above exclusions, the final analyzed data set consisted of data from 60 participants: 18 six-year-olds (nine female, nine male), 20 12-year-olds (11 female, nine male), and 22 adults (13 female, nine male).
2.2. Stimuli
2.2.1. Face Targets for Categorization
Our target stimuli consisted of 48 facial expressions (24 positive, 24 negative) that were shared with us by the lab of Dr. Marc Pell at McGill University (Pell, 2005). These were photographs of eight models (four male, four female) against a gray background (see Fig. 1 for an example). Each model contributed three negative expressions, and three positive expressions to our set. Negative expressions included sad, angry, and fearful faces. Disgusted faces were not included in our selection as previous studies show that children often confuse them with angry faces (Widen and Russell, 2003, 2010), and thus our usage of terms like negative facial expressions when referring to our experiment should be considered with this exclusion of disgust kept in mind. Positive expressions included happy faces with a closed mouth, and happy faces with an open mouth showing a mixture of happiness and surprise. These selections were chosen in order to ensure that the positive and negative face categories were balanced in terms of their average valence and arousal (Vesker et al., 2018b).
Examples of a female negative (fear) face stimulus (left), and a male positive (happy) face stimulus (right), with the eyes and mouth region areas of interest (AOIs) used for our gaze analyses.
Citation: Multisensory Research 35, 2 (2022) ; 10.1163/22134808-bja10063
Each face was surrounded by a dark gray border, and these stimuli appeared as 11 cm/8.067° wide, and 16 cm/11.712° high on the display when seen by participants from a viewing distance of 78 cm.
2.2.2. Emotion Word Primes
Our auditory primes consisted of 48 German emotion words (24 positive, 24 negative) from the BAWL-R database (Võ et al., 2009), each of which was recorded in a neutral tone of voice twice, once by a male trained speaker, and once by a female trained speaker. The positive and negative word categories were balanced in terms of the average valence, arousal, word length, number of phonemes, age of acquisition, and other linguistic parameters (Bahn et al., 2018).
Participants heard the word primes over a pair of headphones, and the volume was individually adjusted for comfort and clarity for each participant during the training phase of the experiment.
2.3. Apparatus
All children were tested to ensure appropriate development for their age using the Wortschatz- und Wortfindungstest WWT 6–10 German vocabulary test (Glück, 2011), and the Raven’s Progressive Matrices (CPM) non-verbal visual pattern-matching intelligence test.
During the experiment, visual stimuli were displayed on a 23-inch (51 cm wide, 28.5 cm high) LCD monitor (Dell Inc., Round Rock, TX, USA), with a stand-alone TX300 eye-tracker (Tobii, Stockholm, Sweden) positioned underneath. The monitor resolution was set at 1920 × 1080 pixels, with a 60-Hz refresh rate. The eye-tracker used a sampling frequency of 300 Hz. Please see https://www.tobiipro.com/siteassets/tobii-pro/product-descriptions/tobii-pro-tx300-product-description.pdf for additional technical specifications of the Tobii model TX300 eye-tracker.
Participants were seated for the duration of the experiment, and their heads were kept in position using a chinrest positioned 78 cm away from the monitor. Participants heard the auditory primes over a pair of over-the-ears headphones (Sennheiser, Wedemark, Germany), and responded to the face stimuli using a pair of X-keys Orby circular buttons (P.I. Engineering, Williamston, MI, USA). Each response button was black, and was labeled underneath with the corresponding response using simplified symbols: a sun for the positive response button, and a raincloud for the negative response button.
The experimental presentation was controlled using E-Prime 2.0 (Psychology Software Tools, Sharpsburg, PA, USA). The recording of the eye-tracking data as well as the eye-tracking analysis were performed using Tobii Studio version 3.4.7 (Tobii AB). Statistical analyses were carried out in SPSS version 26 (IBM Corp., Armonk, NY, USA).
2.4. Procedure
The experimental procedure was conducted in accordance with the German Psychological Society’s (DGPs) Research Ethics Guidelines. The experimental procedure and informed consent protocol were approved by the Office of Research Ethics at the University of Giessen. Upon arriving, each participant (or the parent in the case of child participants), was informed about the study procedure, and signed informed consent forms for participation and data analysis.
After the informed consent forms were signed, each participant was talked through the study procedure, with special care taken to ensure that the child participants understood the concept of positive and negative emotion categories. Participants were then seated in front of the monitor, and instructed to place their chins on a chinrest, with the response buttons placed under their hands in a comfortable position. The orientation of the buttons (whether the positive or negative button was placed on the left or the right side) was randomized across participants. Participants were then provided with a set of headphones. Once the participants were comfortable, the experimenter initiated a five-point calibration sequence.
2.4.1. Practice Trials
After calibration, a practice session of 12 categorization trials was carried out using face and word stimuli that did not appear in the main experiment. The structure of the practice trials was the same as in the main experiment and will be described in greater detail below. The practice trials were also used to adjust the sound volume for each participant’s comfort.
If participants made no more than a single error in the 12 practice trials, they proceeded immediately to the main experiment. If more than a single mistake was made, the 12-trial practice session was repeated one more time, before the participant advanced to the main experiment.
2.4.2. Main Experiment
The main experiment consisted of 144 trials: 48 primed congruently (e.g., a positive word prime, followed by a positive target face), 48 primed incongruently (e.g., a positive word prime, followed by a negative target face), and 48 unprimed (e.g., a negative face target with no word prime). Each of the 48 faces appeared three times during the main experiment (primed congruently, primed incongruently, and unprimed). Each of the 48 emotion word primes was heard twice by each participant (using the recording from the speaker of the same sex as the target face), once as a congruent prime, and once as an incongruent prime. The congruent, incongruent, and unprimed trials were intermixed in random order. Participants were instructed to categorize each target face as positive or negative as quickly as possible.
After the main experiment, each child participant underwent the CPM non-verbal intelligence assessment and WWT 6–10 German vocabulary test, to ensure an appropriate level of development for their age.
2.4.3. Individual Trial Structure
Each trial began with a black circle (diameter 2.5 cm/1.836°) appearing on a light gray background in the middle of the screen, and participants were instructed to fixate inside the circle. As soon as a fixation inside the circle was detected, the trial advanced to the priming stage, which lasted 1000 ms, with the fixation circle remaining on the screen throughout. During the priming stage, the participant either heard the prime word played over headphones in the primed trials, or heard nothing in the unprimed condition. At the end of the priming stage, the fixation circle disappeared and the trial advanced to the categorization stage. During the categorization stage, the target face for categorization appeared on either the left or the right side of the screen, with a distance of 7.3 cm/5.358° between the center of the screen and the edge of the stimulus nearest to the center. The stimulus was vertically centered on the screen, and remained on the screen until the participant responded, or up to a maximum of 5000 ms. At the end of the categorization stage, the stimulus would disappear, and the experiment paused for 500 ms before the start of the next trial. See Fig. 2 below for an illustration of the trail procedure.
An illustration of the experimental trial procedure.
Citation: Multisensory Research 35, 2 (2022) ; 10.1163/22134808-bja10063
2.5. Analysis
After the exclusion of some data based on criteria which will be described in greater detail below for each analysis, four separate repeated-measures ANOVA analyses were used to examine our participants’ accuracy, response time, total fixation time for individual areas of interest (AOI), and time to first fixation (TTFF) for individual AOIs. In cases where significant violations of sphericity were detected among within-subject factors by a Mauchly’s test of sphericity, Greenhouse–Geisser corrections were used to adjust the degrees of freedom (indicated by ggc). Bonferroni corrections for multiple comparisons were used for all post-hoc pairwise comparisons.
3. Results
3.1. Behavioral Results
3.1.1. Accuracy
We began by analyzing the accuracy of participants’ responses after having excluded all responses which belonged to subjects who failed any of the general exclusion criteria (minimum 60% overall accuracy, passing the WWT, passing the CPM, and having a minimum of 40% in terms of eye-tracker sampling success). Additionally, we also excluded the data from any individual trials for which the response time was longer than three standard deviations above that participant’s mean response time, as participants were likely to have been somehow distracted during those trials. Applying these criteria excluded approximately 6.62% of the raw data, leaving data from 8,471 individual trials in the present analysis, which were then averaged across participants for each combination of factors (detailed below).
We carried out a 3 × 2 × 3 three-way repeated-measures ANOVA with participants’ accuracy scores as the dependent variable, and prime type (positive words, negative words, no prime) and face valence (positive faces, negative faces) as within-subject factors, and age (six-year-olds, 12-year-olds, adults) as a between-subjects factor.
We found a significant main effect of age (
We also found a significant two-way interaction between the factors of face valence and prime type (
3.1.2. Response Time
Next, we analyzed the response time measurements for correct responses. For this analysis, we applied the same exclusion criteria described above, and additionally excluded the data from incorrectly responded trials (an additional exclusion of another 5.76% of the raw data) leaving data from 7,949 trials, which were averaged across participants for each combination of factors.
We carried out a 3 × 2 × 3 three-way repeated-measures ANOVA with correct response times in milliseconds as the dependent variable, and prime type (positive words, negative words, no prime) and face valence (positive faces, negative faces) as within-subject factors, and age (six-year-olds, 12-year-olds, adults) as a between-subjects factor.
We found a significant main effect of age (
Another significant main effect was that of face valence (
A third significant main effect was that of prime type (
Finally, regarding the effects of congruency between the prime and target, we found a significant interaction between the factors of face valence and prime type (
3.1.3. Fixation Time
Gaze Analysis — Total Fixation Time. In this analysis, we investigated the total fixation time (summed duration of all fixations in a given AOI per trial) which participants devoted to the eyes and mouth regions of the target faces during categorization, with both the eyes and mouth AOIs having been set at 448 pixels wide by 175 pixels high for analysis (Fig. 1). For this analysis we did not exclude any more trials than in the previous analysis, but now have two data points per trial (one for the eye region of the face, and one for the mouth region), with the data being averaged across participants for each combination of factors.
We carried out a four-way 2 × 3 × 2 × 3 repeated-measures ANOVA on participants’ total fixation time with face valence (positive faces, negative faces), prime type (positive words, negative words, no prime) and AOI (mouth, eyes) as within-subject factors, and age (six-year-olds, 12-year-olds, adults) as a between-subjects factors. The dependent variable was the total fixation time in seconds for the corresponding AOI.
We found a significant main effect of age (
We also found a significant main effect of prime type (
Another significant two-way interaction was between prime type and face valence (
We also found a significant two-way interaction between AOI and age, (
The two above interactions were further qualified by a three-way interaction between age, face valence, and AOI (
Total fixation time in seconds as influenced by the factors of age, face valence, and area of interest (AOI). Error bars represent standard error.
Citation: Multisensory Research 35, 2 (2022) ; 10.1163/22134808-bja10063
Our analysis also revealed one more significant three-way interaction between age, prime type, and AOI (
3.1.4. Time to First Fixation (TTFF)
Gaze Analysis — Time to First Fixation. In this analysis, we examined the TTFF for the mouth versus the eye regions as defined by the eyes and mouth AOIs described earlier. The TTFF measures for each AOI are the time in seconds from the start of the trial until the first fixation was registered within that AOI. This analysis can tell us which AOI participants fixate on first, and how quickly they switch to the second AOI. In addition to all the data exclusions described in the previous analysis, we also omitted data from trials where participants had already positioned their gaze on the location where the stimulus would appear ahead of time through chance (eliminating an additional 4.12% of the raw data). These data points (up to two from each trial, one for each AOI) were then averaged across participants for each combination of factors.
We carried out a four-way 2 × 3 × 2 × 3 repeated-measures ANOVA on the TTFF measures with face valence (positive faces, negative faces), prime type (positive words, negative words, no prime), and AOI (mouth, eyes) as within-subject factors, and age (six-year-olds, 12-year-olds, adults) as a between-subjects factor. The dependent variable was the TTFF measurement in seconds for the corresponding AOI.
We detected a significant main effect of age (
The second significant main effect was that of prime type (
We found another two-way interaction between age and AOI (
Time to First Fixation (TTFF) in seconds as influenced by the factors of age, prime type, and area of interest (AOI). Error bars represent standard error.
Citation: Multisensory Research 35, 2 (2022) ; 10.1163/22134808-bja10063
We also found one additional significant two-way interaction between the factors of face valence and AOI (
4. Discussion
In the present study we wanted to investigate whether and how auditory emotion words might influence the perception of emotional facial expressions, whether this influence changes over the course of development, as well as the role of gaze behavior in this relationship. Our study showed a number of interesting results in terms of the gaze behavior our participants exhibited during the categorization task, as well as the effects of priming with emotion words.
In addition to the overall improvements in speed and accuracy with the increasing age of the participants, older participants were also less sensitive to cross-modal interference from the auditory primes: only the six-year-old children showed higher accuracy for congruently primed trials versus incongruently primed trials, while the 12-year-olds and adults did not exhibit this effect. This finding seems to fit with earlier observations that younger participants are more sensitive to interference from auditory information in visual tasks, such as in a visual selective attention study by Robinson et al. (2018), although this particular study did not involve emotion processing.
We also found that the six-year-olds were the most affected by the presence of any prime, whether congruent or not, as compared to unprimed trials. In this age group we found that unprimed trials showed the slowest response times, longest total fixation durations, and the slowest TTFF measures, out of all three conditions. It therefore seems possible that for the six-year-olds the presence of a word prime of any valence could have facilitated the deployment of visual attention to and processing of the face target. This possibility should be examined in future experiments using additional conditions such as emotionally neutral word primes and audible non-word primes. Such approaches could help to clarify whether it is truly the emotion word nature of the primes in our experiment that produced this effect, or whether more general language or auditory priming is responsible.
Another set of interesting results touched on the general distribution of gaze between the eyes and mouth region AOIs. The total fixation time and TTFF measures for these AOIs showed that the children and adults exhibited different behaviors: While the adults tended to make the first fixation on the eyes region and spent longer fixating within the eye region, children tended to show more similar total fixations times and TTFF measures for the mouth and eyes regions. In fact, with respect to total fixation time, the six-year-old children even showed significantly longer looking times for the mouths of positive faces. This increase in visual attention toward the eyes with increasing age, particularly the results pertaining to the total fixation time, fit the results reported by Pollux et al. (2014), who also found that, prior to any training, children tended to fixate longer on the mouth while adults fixated longer on the eyes. In terms of the reason for this change, it might be that the children used a simplified strategy to process the facial expressions, relying primarily on assessing whether the target face is smiling. Meanwhile, the adults seem to have used a more sophisticated strategy relying more on the eyes than the mouth.
This pattern in adult participants seems unsurprising, as numerous studies showed the eye region to be crucial for face perception both in terms of perceiving facial expressions (e.g., Eisenbarth and Alpers, 2011), and facial identify (e.g., Rossion et al., 2009), as well as being an important component for holistic face processing (e.g., Nemrodov et al., 2014). However, one must still ask why the adult participants should use a more sophisticated gaze strategy focused on the eyes when the mouth-focused gaze behavior of the children was still sufficient to produce an adequate level of performance in our task. One possibility is that adults automatically assess the authenticity of emotional facial expressions in addition to evaluating them based on typical expression prototypes. The eyes would naturally be an important marker for such an evaluation of expression authenticity (Williams et al., 2001), and it could be that such automatic authenticity-evaluation behavior is something that develops gradually over the course of development as children gain experience with social situations of increasing complexity.
More generally, we also found that congruently primed trials were always responded to faster than unprimed trials, but only negative face targets showed significantly faster categorization in congruent trials compared to incongruent trials. Finally, we found that TTFF measures for the mouth region were generally slower for negative faces compared to positive faces, indicating that perhaps the mouth is less important for the categorization of negative expressions compared to positive expressions, which is consistent with the findings of Eisenbarth and Alpers (2011) and Schurgin et al. (2014).
Our study was limited by several factors. One limitation was the need to keep the task as simple as possible for our youngest age group, which led to some ceiling effects, particularly in the group of adults when it came to accuracy. Another limitation was the use of still photographs as categorization targets and single word primes as stimuli. Therefore, future studies could increase the ecological validity of our findings by using dynamic facial expressions as stimuli, as well as including more extended auditory primes (e.g., phrases or entire sentences). The usage of dynamic stimuli in the form of speaking faces might, for example, reveal additional multisensory effects since the movement of the mouth might attract more visual attention to that region and may influence participants’ processing of auditory stimuli (Smith et al., 2016). Additionally, the priming paradigm itself could be supplemented by conditions where the auditory signal and the facial expression are presented simultaneously in order to test whether our findings hold true when using more integrated audiovisual stimuli. Regarding the auditory primes, the use of semantic meaning as the positive/negative differentiator might have limited the influence of the primes, particularly in the younger children who might have been less familiar with some of the words. It would thus be interesting for future studies to also consider manipulating the emotional prosody of auditory word primes to potentially enhance the priming effect, especially when studying younger age groups. Finally, it would be useful for future studies to include an additional emotionally neutral condition in order to provide a clearer baseline for how emotional expressions influence gaze behavior.
In summary, our findings seem to indicate that although congruent emotion word priming seems to improve categorization accuracy in six-year-old children, it appears to have little influence on gaze behavior. However, the presence of any emotion word prime, regardless of congruency, did seem to influence the speed of categorization and gaze timing of six-year-olds. It therefore appears that the sensitivity of emotional face categorization to priming with words (in terms of both performance and gaze behavior) decreases after the age of six years. Our findings also show that children and adults exhibit different gaze behaviors when categorizing emotional faces, with adults prioritizing the eye region, which the children do not appear to do.
Funding Information
The present study was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) — project number 222641018 — SFB/TRR 135 TP C3.
Conflicts of Interest
The authors declare having no potential conflicts of interest with respect to this study.
Data Availability
The data used for our analyses can be found on the Zenodo repository under the title of this study.
To whom correspondence should be addressed. E-mail: Michael.Vesker@psychol.uni-giessen.de
References
Bahn, D., Kauschke, C., Vesker, M. and Schwarzer, G. (2018). Perception of valence and arousal in German emotion terms: a comparison between 9-year-old children and adults, Appl. Psycholinguist. 39, 463–481. DOI:10.1017/S0142716417000443.
Bahn, D., Vesker, M., García Alanis, J. C., Schwarzer, G. and Kauschke, C. (2017). Age-dependent positivity-bias in children’s processing of emotion terms, Front. Psychol. 8, 1268. DOI:10.3389/fpsyg.2017.01268.
Bodenschatz, C. M., Kersting, A. and Suslow, T. (2019). Effects of briefly presented masked emotional facial expressions on gaze behavior: an eye-tracking study, Psychol. Rep. 122, 1432–1448. DOI:10.1177/0033294118789041.
Calvo, M. G., Fernández-Martín, A., Gutiérrez-García, A. and Lundqvist, D. (2018). Selective eye fixations on diagnostic face regions of dynamic emotional expressions: KDEF-dyn database, Sci. Rep. 8, 17039. DOI:10.1038/s41598-018-35259-w.
Campanella, S. and Belin, P. (2007). Integrating face and voice in person perception, Trends Cogn. Sci. 11, 535–543. DOI:10.1016/j.tics.2007.10.001.
Carroll, N. C. and Young, A. W. (2005). Priming of emotion recognition, Q. J. Exp. Psychol. A 58, 1173–1197. DOI:10.1080/02724980443000539.
Collignon, O., Girard, S., Gosselin, F., Roy, S., Saint-Amour, D., Lassonde, M. and Lepore, F. (2008). Audio-visual integration of emotion expression, Brain Res. 1242, 126–135. DOI:10.1016/j.brainres.2008.04.023.
de Boer, M. J., Başkent, D. and Cornelissen, F. W. (2020). Eyes on emotion: dynamic gaze allocation during emotion perception from speech-like stimuli, Multisens. Res. 34, 17–47. DOI:10.1163/22134808-bja10029.
De Sonneville, L. M. J., Verschoor, C. A., Njiokiktjien, C., Op het Veld, V., Toorenaar, N. and Vranken, M. (2002). Facial identity and facial emotions: speed, accuracy, and processing strategies in children and adults, J. Clin. Exp. Neuropsychol. 24, 200–213. DOI:10.1076/jcen.24.2.200.989.
Eisenbarth, H. and Alpers, G. W. (2011). Happy mouth and sad eyes: scanning emotional facial expressions, Emotion 11, 860–865. DOI:10.1037/a0022758.
Filippi, P., Ocklenburg, S., Bowling, D. L., Heege, L., Güntürkün, O., Newen, A. and de Boer, B. (2017). More than words (and faces): evidence for a Stroop effect of prosody in emotion word processing, Cogn. Emot. 31, 879–891. DOI:10.1080/02699931.2016.1177489.
Gao, X. and Maurer, D. (2010). A happy story: developmental changes in children’s sensitivity to facial expressions of varying intensities, J. Exp. Child Psychol. 107, 67–86. DOI:10.1016/j.jecp.2010.05.003.
Glück, C. W. (2011). Wortschatz- und Wortfindungstest für 6- bis 10-Jährige: WWT 6–10. Urban & Fischer/ Elsevier, Munich, Germany.
Kauschke, C., Bahn, D., Vesker, M. and Schwarzer, G. (2017). Die semantische Repräsentation von Emotionsbegriffen bei Kindern im Grundschulalter, Kindheit Entwicklung 26, 251–260. DOI:10.1026/0942-5403/a000238.
Kauschke, C., Bahn, D., Vesker, M. and Schwarzer, G. (2019). The role of emotional valence for the processing of facial and verbal stimuli — positivity or negativity bias?, Front. Psychol. 10, 1654. DOI:10.3389/fpsyg.2019.01654.
Lansing, C. R. and McConkie, G. W. (2003). Word identification and eye fixation locations in visual and visual-plus-auditory presentations of spoken sentences, Percept. Psychophys. 65, 536–552. DOI:10.3758/BF03194581.
Mancini, G., Agnoli, S., Baldaro, B., Ricci Bitti, P. E. and Surcinelli, P. (2013). Facial expressions of emotions: recognition accuracy and affective reactions during late childhood, J. Psychol. 147, 599–617. DOI:10.1080/00223980.2012.727891.
Nakano, T., Tanaka, K., Endo, Y., Yamane, Y., Yamamoto, T., Nakano, Y., Ohta, N., Kato, N. and Kitazawa, S. (2010). Atypical gaze patterns in children and adults with autism spectrum disorders dissociated from developmental changes in gaze behaviour, Proc. R. Soc. B Biol. Sci. 277, 2935–2943. DOI:10.1098/rspb.2010.0587.
Nemrodov, D., Anderson, T., Preston, F. F. and Itier, R. J. (2014). Early sensitivity for eyes within faces: a new neuronal account of holistic and featural processing, NeuroImage 97, 81–94. DOI:10.1016/j.neuroimage.2014.04.042.
Nummenmaa, L. and Calvo, M. G. (2015). Dissociation between recognition and detection advantage for facial expressions: a meta-analysis, Emotion 15, 243–256. DOI:10.1037/emo0000042.
Paulmann, S. and Pell, M. D. (2011). Is there an advantage for recognizing multi-modal emotional stimuli?, Motiv. Emot. 35, 192–201. DOI:10.1007/s11031-011-9206-0.
Pell, M. D. (2005). Nonverbal emotion priming: evidence from the “facial affect decision task”, J. Nonverbal Behav. 29, 45–73. DOI:10.1007/s10919-004-0889-8.
Pollux, P. M. J., Hall, S. and Guo, K. (2014). Facial expression training optimises viewing strategy in children and adults, PLoS ONE 9, e105418. DOI:10.1371/journal.pone.0105418.
Robinson, C. W., Hawthorn, A. M. and Rahman, A. N. (2018). Developmental differences in filtering auditory and visual distractors during visual selective attention, Front. Psychol. 9, 2564. DOI:10.3389/fpsyg.2018.02564.
Rodger, H., Vizioli, L., Ouyang, X. and Caldara, R. (2015). Mapping the development of facial expression recognition, Dev. Sci. 18, 926–939. DOI:10.1111/desc.12281.
Rossion, B., Kaiser, M. D., Bub, D. and Tanaka, J. W. (2009). Is the loss of diagnosticity of the eye region of the face a common aspect of acquired prosopagnosia?, J. Neuropsychol. 3, 69–78. DOI:10.1348/174866408X289944.
Scheller, E., Büchel, C. and Gamer, M. (2012). Diagnostic features of emotional expressions are processed preferentially, PloS ONE 7, e41792. DOI:10.1371/journal.pone.0041792.
Schurgin, M. W., Nelson, J., Iida, S., Ohira, H., Chiao, J. Y. and Franconeri, S. L. (2014). Eye movements during emotion recognition in faces, J. Vis. 14, 14. DOI:10.1167/14.13.14.
Smith, H. M. J., Dunn, A. K., Baguley, T. and Stacey, P. C. (2016). Matching novel face and voice identity using static and dynamic facial images, Atten. Percept. Psychophys. 78, 868–879. DOI:10.3758/s13414-015-1045-8.
Vatikiotis-Bateson, E., Eigsti, I.-M., Yano, S. and Munhall, K. G. (1998). Eye movement of perceivers during audiovisual speech perception, Percept. Psychophys. 60, 926–940. DOI:10.3758/bf03211929.
Vesker, M., Bahn, D., Degé, F., Kauschke, C. and Schwarzer, G. (2018a). Developmental changes in the categorical processing of positive and negative facial expressions, PLoS ONE 13, e0201521. DOI:10.1371/journal.pone.0201521.
Vesker, M., Bahn, D., Degé, F., Kauschke, C. and Schwarzer, G. (2018b). Perceiving arousal and valence in facial expressions: differences between children and adults, Eur. J. Dev. Psychol. 15, 411–425. DOI:10.1080/17405629.2017.1287073.
Vesker, M., Bahn, D., Kauschke, C., Tschense, M., Degé, F. and Schwarzer, G. (2018c). Auditory emotion word primes influence emotional face categorization in children and adults, but not vice versa, Front. Psychol. 9, 618. DOI:10.3389/fpsyg.2018.00618.
Võ, M. L. H., Conrad, M., Kuchinke, L., Urton, K., Hofmann, M. J. and Jacobs, A. M. (2009). The Berlin affective word list reloaded (BAWL-R), Behav. Res. Meth. 41, 534–538. DOI:10.3758/BRM.41.2.534.
Vo, M. L. H., Smith, T. J., Mital, P. K. and Henderson, J. M. (2012). Do the eyes really have it? Dynamic allocation of attention when viewing moving faces, J. Vis. 12, 3. DOI:10.1167/12.13.3.
Widen, S. C. and Russell, J. A. (2003). A closer look at preschoolers’ freely produced labels for facial expressions, Dev. Psychol. 39, 114–128. DOI:10.1037/0012-1649.39.1.114.
Widen, S. C. and Russell, J. A. (2010). The “disgust face” conveys anger to children, Emotion 10, 455–466. DOI:10.1037/a0019151.
Williams, L. M., Senior, C., David, A. S., Loughland, C. M. and Gordon, E. (2001). In search of the “Duchenne smile”: evidence from eye movements, J. Psychophysiol. 15, 122–127. DOI:10.1027//0269-8803.15.2.122.
Xu, Q.-R., He, W. C., Ye, C.-X. and Luo, W.-B. (2019). Attentional bias processing mechanism of emotional faces: anger and happiness superiority effects, Acta Psychol. Sin. 71, 86–94. DOI:10.13294/j.aps.2018.0098.