Infants are able to learn novel associations between visual objects and auditory linguistic labels (such as a dog and the sound /dɔg/) by the end of their first year of life. Surprisingly, at this age they seem to fail to learn the associations between visual objects and natural sounds (such as a dog and its barking sound). Researchers have therefore suggested that linguistic learning is special (Fulkerson and Waxman, 2007) or that unfamiliar sounds overshadow visual object processing (Robinson and Sloutsky, 2010). However, in previous studies visual stimuli were paired with arbitrary sounds in contexts lacking ecological validity. In the present study, we create animations of two novel animals and two realistic animal calls to construct two audiovisual stimuli. In the training phase, each animal was presented in motions that mimicked animal behaviour in real life: in a short movie, the animal ran (or jumped) from the periphery to the center of the monitor, and it made calls while raising its head. In the test phase, static images of both animals were presented side-by-side and the sound for one of the animals was played. Infant looking times to each stimulus were recorded with an eye tracker. We found that following the sound, 12-month-old infants preferentially looked at the animal corresponding to the sound. These results show that 12-month-old infants are able to learn novel associations between visual objects and natural sounds in an ecologically valid situation, thereby challenging our current understanding of the development of crossmodal association learning.