We examined the effect of audiovisual training on learning a repeated sequence of motor responses. Participants were trained with either congruent or incongruent audiovisual cues to produce motor responses. Learning was tested by comparing reaction times to untrained sequences and by asking participants to recreate the trained sequence. A strong association was found between the two measures and the majority of high-scoring participants belonged to the congruent audiovisual condition. Because the second measure requires explicit knowledge of the trained sequence, we conclude that audiovisual congruency facilitates explicit learning.
Many everyday tasks require an individual to generate a specific pattern of motor movements in response to some stimulus. A pianist translates sheet music into a series of motor commands executed in a specific sequence. Eventually, the music piece is played with little effort. This sequence learning is believed to arise from an interplay between multiple brain regions, including sensory, motor, and memory systems and has become the focus of a large body of research (Keele et al., 2003; Robertson, 2007). Sequence learning is commonly studied using the serial reaction time (SRT) task, where participants make rapid motor responses to a series of sensory cues following a repeating sequence (Keele et al., 2003; Nissen and Bullemer, 1987). Participants eventually learn the pattern despite not being informed it exists, as evidenced by a reduction in reaction time (RT) when responding to the repeating sequence compared to a random sequence.
The majority of SRT studies have been conducted with visual stimuli, but only a handful have examined the SRT task in other sensory modalities, such as the auditory and tactile modalities (Abrahamse et al., 2008, 2009; Riedel and Burton, 2006). These studies have shown that sequence learning is not specific to visuomotor learning. For instance, Abrahamse et al. (2008) trained participants with either visual-only or tactile-only stimuli and found that both groups experienced sequence learning. Furthermore, this transferred from tactile to visual conditions and vice versa.
Although few studies have examined the SRT task outside of the visual modality, even fewer have investigated the SRT task under multisensory conditions. Indeed, multisensory training has been shown to facilitate other forms of visual perceptual learning (Kim et al., 2008; Seitz et al., 2006). Therefore, it is worthwhile to consider whether sequence learning can be similarly facilitated by multisensory training. In one such study, Abrahamse et al. (2009) trained participants with visual-only, tactile-only, or visual-tactile cues. The authors found that, although all three groups experienced significant sequence learning, the tactile-only condition resulted in a smaller RT benefit than either the visual-only or visual-tactile condition, and they found no apparent benefit of multisensory training. However, at least one instance of multisensory facilitation of sequence learning has been demonstrated by Hoffman et al. (2001) and replicated by Stöcker et al. (2003). In these studies, participants responded to visual cues in an SRT task and each key press response was followed by a tone, either with a contingent mapping (a given key was mapped to a specific tone) or a non-contingent mapping (the tone produced by each key varied). The authors found that while both conditions resulted in significant sequence learning, the contingent mapping resulted in a greater RT benefit and greater explicit knowledge of the underlying sequence compared to the non-contingent mapping.
Studies such as these suggest not only that multisensory facilitation of sequence learning can occur, but also that the effect of multisensory training may manifest itself in more than one way (e.g., by producing an overall RT benefit, as well as enhancing explicit awareness of the sequence structure). In particular, the Hoffman et al. (2001) and Stöcker et al. (2003) studies demonstrated that auditory feedback of a visually-cued motor sequence reinforces sequence learning. An interesting question is whether a similar facilitation is found when the auditory stimulus occurs simultaneously with the visual cue, as in other examples of multisensory facilitation of learning (Seitz et al., 2006). The simultaneous audiovisual cue may provide a better opportunity for bottom-up multisensory interactions to facilitate sequence learning compared to the auditory feedback employed by Hoffman et al. (2001) and Stöcker et al. (2003). In the current study, we investigate this question by examining overall RT benefits and the extent of implicit and explicit awareness of a motor sequence cued by an audiovisual stimulus.
2. Material and Methods
Eighty UCLA undergraduates participated in this study for extra credit in their courses, as directed by the UCLA IRB in accordance with the Declaration of Helsinki. The participants viewed four unfilled horizontally-aligned circles on a computer monitor and placed the fingers of their right hand (index through pinky) over four keys, which corresponded to the four on-screen circles, as depicted in Fig. 1. When one circle became filled, the participant pressed the corresponding key, after which another circle became filled, resulting in a rapid series of responses. A different circle was filled 200 ms after each response.
Participants were randomly assigned to one of four auditory conditions (20 participants per condition). During the experiment, the onsets of the visual cues were paired with one of four auditory tone cues, which played until a key was pressed (using A440 standard and scientific pitch notation, the notes were C5, D5, E5, and F5). In the congruent condition, C5 was always paired with the far-right circle, D5 was always paired with the next circle, etc. Every tone was therefore uniquely and consistently mapped to a visual cue. In the random tone condition, there was no consistent audiovisual mapping, and any of the four tones randomly played each time a new circle was filled. In the single tone condition, the tone C5 was always paired with every visual cue. Finally, the auditory cue was absent throughout the SRT task in the no tone (visual-only) condition.
Unbeknownst to the participants, the circles were filled according to a pre-determined 12-element sequence (the training sequence), which was repeated during the SRT task. Past SRT studies have used a single training sequence for all participants, but this may result in sequence-specific effects (DeCoster and O’Mally, 2011). To increase the generalizability of our results, a set of 20 unique training sequences was pre-generated and used in each of the four auditory conditions (i.e., one sequence per participant in each condition). Each element of a given training sequence could range from 1 to 4, which corresponded to the circles from left to right (i.e., 1 = far left circle, 2 = second circle from the left, etc.). The 20 sequences were randomly generated with the following restrictions:
- 1.Each element type (1, 2, 3, and 4) occurred three times per training sequence.
- 2.No four-element chunk (e.g., 1-4-2-1) within a given training sequence could repeat within that sequence.
- 3.Repeated elements (e.g., 3-3), sequential ‘runs’ (e.g., 1-2-3-4), and ‘reversals’ (e.g., 1-2-1-2) were not allowed, as these patterns are easily identifiable (Reed and Johnson, 1994; Vaquero et al., 2006).
The SRT task was divided into 12 blocks. All blocks consisted of 12 repetitions of the training sequence, except for blocks 9 and 11, both of which were comprised of 12 untrained and unrepeated sequences generated using the same three aforementioned rules. The untrained sequences were unique from the training sequence; no four-element chunk from the training sequence was allowed to appear in any untrained sequences. Subtracting the mean RT of a block with a repeated training sequence (block 10) from the mean RT of a block with untrained sequences (block 9) creates a primary measure of sequence learning. Furthermore, no auditory cues were presented during blocks 11 and 12. As such, we acquired a measure of sequence learning with audiovisual stimuli (Test 1: block 9 minus block 10) and with visual-only stimuli (Test 2: block 11 minus block 12).
After completing the SRT task, participants were informed of the repeating sequence. Participants then completed an explicit and implicit test of their sequence knowledge (adapted from Destrebecqz and Cleeremans, 2001). In the explicit test, participants were asked to reproduce the training sequence for 144 key presses. In the implicit test, participants were asked to press the keys in a completely random order for 144 key presses. In both the explicit and implicit test, auditory cues were not presented, however after each key press the corresponding on-screen circle was filled.
We calculated average block-wise and condition-wise RTs, illustrated in Fig. 2. To analyze sequence learning in the presence of audiovisual stimuli, we compared performance during blocks 9 and 10 across conditions by conducting a 2 (Sequence: untrained, trained) by 4 (Condition: congruent, random tone, single tone, no tone) mixed ANOVA. The results indicated a significant main effect of Sequence, , , , such that average RTs were faster for the trained sequence than the untrained sequence. There was also a significant main effect of Condition, , , . Pairwise comparisons indicated that both the congruent and random tone condition had significantly faster RTs than the single tone condition, . Planned comparisons revealed that all four conditions had significantly faster RTs for the trained sequence than the untrained sequence, .
To examine the extent of sequence learning with visual-only stimuli, we compared performance during blocks 11 and 12 across conditions by conducting a 2 (Sequence) by 4 (Condition) mixed ANOVA. The results revealed a significant main effect of Sequence, , , , such that average RTs were faster for the trained sequence than the untrained sequence. Planned comparisons indicated that all four conditions had significantly faster RTs for the trained sequence than the untrained sequence, .
To quantify the magnitude of sequence learning with and without auditory stimuli across conditions, block 10 was subtracted from block 9 (audiovisual RT test in ms: congruent 73, random 58, single 61, no-tone 59) and block 12 was subtracted from block 11 (visual-only RT test in ms: congruent 68, random 54, single 71, no tone 73) for each participant. We then conducted a 2 (Stimulus: audiovisual, visual-only) by 4 (Condition) mixed ANOVA. There was no significant main effect of Stimulus, , or Condition, , nor was there a significant interaction, . Thus, although each condition experienced significant sequence learning, the magnitude of the learning did not change significantly when the auditory stimuli were removed, nor did the magnitude significantly vary across conditions.
Performance on the explicit and implicit tests was quantified by examining every condition of four sequentially-pressed keys a participant produced and checking whether it matched any four-element chunk from that participant’s training sequence. The number of matches was then converted to a percentage for each test by dividing the participant’s total number of four-element matches to the total number of possible matches. We then conducted a 2 (Test: explicit, implicit) by 4 (Condition) mixed ANOVA. There was a significant main effect of Test, such that mean performance in the explicit test (%, %) was significantly higher than in the implicit test (%, %), , , . Furthermore, there was a significant interaction between Test and Condition, , , . Pairwise comparisons revealed that in the explicit test, participants in the congruent condition (%, %) displayed significantly more explicit sequence knowledge compared to participants in the random tone condition (%, %), , , and trended toward having more explicit knowledge than participants in both the single tone condition (%, %), , , and the no tone condition (%, %), , . Figure 3A illustrates the implicit and explicit test results.
To explore the potential association between performance in the SRT task and participants’ knowledge of the underlying sequence structure, we correlated participants’ magnitude of sequence learning in the SRT task with their performance in the explicit and implicit tests. Because no significant differences were found between the audiovisual and visual-only RT tests, we collapsed performance across these two tests to obtain a single measure of each participant’s magnitude of sequence learning and then correlated this measure with performance in the explicit and implicit tests. When collapsed across conditions, we found a significant correlation between explicit sequence knowledge and magnitude of sequence learning, , . When examining this effect across conditions, a significant correlation between explicit sequence knowledge and magnitude of sequence learning was observed in the congruent condition, , , and the no tone condition, , , but not in the random tone or single tone conditions. Figure 3B plots RT scores as a function of explicit knowledge. No significant association was found between RT scores and implicit knowledge.
In this study, we tested the effect of concurrently-presented auditory and visual cues during an SRT task. We observed the typical SRT learning effect, as characterized by reduced RTs during the presentation of trained patterns versus increased RTs during the presentation of untrained patterns. Furthermore, the magnitude of the SRT learning effect was not significantly different across conditions regardless of the auditory condition.
Considering only the magnitude of each condition’s RT scores, it is tempting to conclude that there was no benefit or observable effect of training with congruent audiovisual stimuli compared to the control (audiovisual and visual-only) conditions. However, the results of the test of explicit knowledge indicate that the congruent condition resulted in greater explicit sequence knowledge compared the other conditions, as a majority of participants with an explicit score of over 50% (7 out of 11) belong in the congruent condition. This suggests that training with congruent multisensory stimuli encourages the development of explicit sequence knowledge. Note that these explicit test results are similar to those found by Hoffman et al. (2001) as well as Stöcker et al. (2003). In contrast, the test of implicit knowledge failed to find any statistically significant effects, likely because the magnitude of scores on the implicit test was low. As a result, this test was not powerful enough to detect changes in implicit knowledge between conditions. A more powerful measure is required to examine the relationship between implicit learning and multisensory training.
Examining the correlation between explicit test performance and RT task performance across individual conditions reveals that the congruent and no-tone conditions independently achieved highly significant correlations. However, only the congruent condition achieved a good spread of data points. Participants in the other three conditions overwhelmingly scored similarly low in the explicit test, making interpretations of individual-condition correlation problematic. Nevertheless, the overall correlation results in a good spread of data points and a relatively strong relationship between both measures (). As a result, we conclude that the extent of explicit knowledge was likely the major driver of RT magnitude in our SRT task, again in agreement with Stöcker et al. (2003).
Further, we conclude that the random and single tone conditions are no better than the no-tone condition at encouraging the development of explicit knowledge, while participants in the congruent condition are more likely to develop more explicit knowledge. However, note that a slight majority of congruent-condition participants also attained low explicit test scores. Thus, while the average audiovisual congruent RT score was numerically higher than the other three scores, we lacked the appropriate statistical power to detect this small effect (∼10 ms benefit). Clearly, our congruent condition did not guarantee the development of high explicit knowledge, but merely increased the probability of explicit knowledge developing in a given participant relative to the other conditions. This probability may be modulated by traits inherent to each participant, though more targeted work on individual differences is required.
In conclusion, we found a strong correlation between explicit knowledge and RT test performance. Furthermore, the current study suggests that audiovisual congruency encourages the development of high levels of explicit knowledge in SRT tasks, while the level of explicit knowledge fostered by conditions without audiovisual congruency is comparable to that fostered by the standard visual-only SRT task.
This study was partially supported by National Science Foundation grant BCS-1057969.
Statement of conflicts of interest
Abrahamse E. L., Van der Lubbe R. H. J., Verwey W. B. (2008). Asymmetrical learning between a tactile and visual serial RT task, Q. J. Exp. Psychol. (Hove) 61, 210–217.
Abrahamse E. L., Van der Lubbe R. H. J., Verwey W. B. (2009). Sensory information in perceptual-motor sequence learning: visual and/or tactile stimuli, Exp. Brain Res. 197, 175–183.
Destrebecqz A., Cleeremans A. (2001). Can sequence learning be implicit? New evidence with the process dissociation procedure, Psychonom. Bull. Rev. 8, 343–350.
Hoffmann J., Sebald A., Stöcker C. (2001). Irrelevant response effects improve serial learning in serial reaction time tasks, J. Exp. Psychol. Learn. Mem. Cogn. 27, 470–482.
Keele S. W., Ivry R., Mayr U., Hazeltine E., Heuer H. (2003). The cognitive and neural architecture of sequence representation, Psychol. Rev. 110, 316–339.
Kim R. S., Seitz A. R., Shams L. (2008). Benefits of stimulus congruency for multisensory facilitation of visual learning, PLoS One 3, e1532. DOI:10.1371/journal.pone.0001532.
Nissen M. J., Bullemer P. (1987). Attentional requirements of learning: evidence from performance measures, Cogn. Psychol. 19, 1–32.
Reed J., Johnson P. (1994). Assessing implicit learning with indirect tests: determining what is learned about sequence structure, J. Exp. Psychol. Learn. Mem. Cogn. 20, 585–594.
Riedel B., Burton A. M. (2006). Auditory sequence learning: differential sensitivity to task relevant and task irrelevant sequences, Psychol. Res. 70, 337–344.
Stöcker C., Sebald A., Hoffmann J. (2003). The influence of response–effect compatibility in a serial reaction time task, Q. J. Exp. Psychol. A 56, 685–703.
Vaquero J. M. M., Jiménez L., Lupiáñez J. (2006). The problem of reversals in assessing implicit sequence learning with serial reaction time tasks, Exp. Brain Res. 175, 97–109.