The Development of Phonological Stratification: Evidence from Stop Voicing Perception in Gurindji Kriol and Roper Kriol

In: Journal of Language Contact
Jesse Stewart University of Saskatchewan

Search for other papers by Jesse Stewart in
Current site
Google Scholar
Felicity Meakins University of Queensland

Search for other papers by Felicity Meakins in
Current site
Google Scholar
Cassandra Algy Karungkarni Arts

Search for other papers by Cassandra Algy in
Current site
Google Scholar
, and
Angelina Joshua Ngukurr Language Centre

Search for other papers by Angelina Joshua in
Current site
Google Scholar
Open Access

This study tests the effect of multilingualism and language contact on consonant perception. Here, we explore the emergence of phonological stratification using two alternative forced-choice (2afc) identification task experiments to test listener perception of stop voicing with contrasting minimal pairs modified along a 10-step continuum. We examine a unique language ecology consisting of three languages spoken in Northern Territory, Australia: Roper Kriol (an English-lexifier creole language), Gurindji (Pama-Nyungan), and Gurindji Kriol (a mixed language derived from Gurindji and Kriol). In addition, this study focuses on three distinct age groups: children (group i, 8>), preteens to middle-aged adults (group ii, 10–58), and older adults (group iii, 65+). Results reveal that both Kriol and Gurindji Kriol listeners in group ii contrast the labial series [p] and [b]. Contrarily, while alveolar [t] and velar [k] were consistently identifiable by the majority of participants (74%), their voiced counterparts ([d] and [g]) showed random response patterns by 61% of the participants. Responses to the voiced stimuli from the preteen-adult Kriol group were, however, significantly more consistent than in the Gurindji Kriol group, suggesting Kriol listeners may be further along in acquiring the voicing contrast. Significant results regarding listener exposure to Standard English in both language groups also suggests constant exposure to English maybe a catalyst for setting this change in motion. The more varied responses from the Gurindji, Kriol, and Gurindji Kriol listeners in groups ii and iii, who have little exposure to English, help support these findings.

1 Introduction

Contact languages provide a unique opportunity for analysing extensive language change in a considerably short period of time. While language change under normal circumstances can take generations before variation is quantifiable, the effects of language contact can often be seen in as little as a single generation. In the case of mixed languages, an extreme variety of contact language, entire lexical and grammatical elements can transfer from one language into another, often within a single generation. Mixed languages are often characterised as containing the lexical or grammatical patterns of linguistic elements from different languages, referred to as stratification. While there is a wealth of literature that explores lexical and grammatical stratification in mixed languages and languages under other intense contact scenarios (Hickey, 2010b; Matras and Bakker, 2003; S. Thomason, 1997), we are only just beginning to understand how the phonological systems from different languages interact at the phonetic level (both in production and perception) in situations of language contact. In the more classical sense, we are only just beginning to understand the effects of phonological interference in mixed languages. This paper adds to this literature with a perceptual study that explores a specific phonemic conflict site (a conflicting area of phonological convergence) involving stop voicing contrasts. Here, we provide a synchronic description of stop perception in three Australian languages that have either emerged or changed through considerable contact via English: Kriol, Gurindji Kriol, and Gurindji.

These languages provide an interesting test case for exploring the perceptual effects of language contact in that the clear majority of Australian languages do not have contrastive stop voicing, while English, the primary lexifier of Kriol and Gurindji Kriol, clearly does. We show that, based on perceptual data (and with further evidence from production data from previous studies (Bundgaard-Nielsen and Baker, 2016; Jones and Meakins, 2013)), it is likely that during the development of Kriol and Gurindji Kriol, the stop contrast was not initially present. Through constant and increasing contact with English and the recognition of Kriol/English cognates, however, there is evidence that a voicing contrast is now developing in both languages—an example of language contact adding complexity to a linguistic system. This study also demonstrates that Kriol is more advanced in the development of a voicing contrast than Gurindji Kriol, which is likely the result of 50 years more exposure to English through the earlier presence of formal Western education.

Importantly, this study provides further evidence that linguistic systems and complexity develop incrementally and with variation. Previous studies of loan phonology typically characterise loan words as either conforming wholesale to the recipient language phonology or as categorically introducing new phonemes into the recipient language in restricted areas of the lexicon (and thereby creating stratification) (Bullock, 2009; Campbell, 1996; Hyman, 1970; Itô and Mester, 1995; Matras, 2009). This study provides a more nuanced picture of how stratification occurs. In this respect, this study joins the growing morphosyntactic literature on mixed languages that demonstrates the complex nature of language development under intensive contact with other languages. Earlier studies often characterised mixed languages as faithfully replicating the morphosyntactic patterns from both of their source languages, however subsequent studies have noted that transferred grammatical elements often undergo change when they are absorbed into the recipient language (often under the influence of patterns in the recipient language). Furthermore, absorption is not categorical but is an incremental process, resulting in variation among speakers. For example, the transfer of the Gurindji ergative suffix into Gurindji Kriol in the genesis of this mixed language saw its transformation into an optional nominative case suffix under the influence of Kriol argument structure (Meakins, 2015). Similarly, this study captures two languages at different stages of developing phonological stratification, demonstrating how the contrast has developed in individual bilingual speakers and is incrementally propagating through the speaker communities.

1.1 Gurindji Kriol

Gurindji Kriol is a mixed language spoken in the Victoria River District of northern Australia that is located 470 kilometres from the nearest town of Katherine. It emerged around 40 years ago and is now spoken by Gurindji people in the Aboriginal communities 1 of Daguragu and Kalkaringi, and by Bilinarra and Ngarinyman people in two communities north of Kalkaringi—Pigeon Hole and Yarralin.

Gurindji Kriol originates in Gurindji (Ngumpin Yapa, Pama-Nyungan), the traditional Australian language of the region, and Kriol, the English-lexifier creole language spoken across much of northern Australia. It combines the lexicon and structure of these two languages. The structural mix of Gurindji Kriol is well documented, with Gurindji providing much of the noun phrase system and Kriol contributing the verb phrase system (e.g., Meakins, 2011). This type of mixed language is referred to as a V(erb)-N(oun) mixed language and includes Michif and Light Warlpiri (Matras and Bakker, 2003; Meakins, 2013b). The lexicon of Gurindji Kriol is also highly mixed. Based on a 200 word Swadesh list, 36.6% of vocabulary is derived from Kriol and 35% finds its origins in Gurindji. The remaining 28.4% contains synonymous forms from both languages (Meakins, 2011: 19). The extent of lexical mixing is shown in (1) below where Gurindji forms are given in italics and Kriol forms in plain font.

(1) (Meakins, 2011: 18)


Gurindji Kriol now has around 700 speakers. It is the main language spoken and acquired at Kalkaringi. Gurindji is still spoken by people over the age of 40 years, albeit generally code-switched with Kriol. All Gurindji people speak Kriol to varying extents when they visit Kriol-speaking areas to the north, for example Katherine and Timber Creek, but do not speak it at home. Standard Australian English is the language of the school despite the fact that children enter school with no background in English. English is also the language of the media and government services but it plays little role in people’s home lives (Meakins, 2008: 287–295).

Gurindji Kriol originated from contact between non-Indigenous colonists and the Gurindji people. In the early 1900s, white pastoralists set up cattle stations in the Victoria River District area, including on the homelands of the Gurindji. Many Gurindji people were killed in skirmishes over land, and the remaining people were put to work on Wave Hill Station in the early 1900s as stockmen and kitchen hands in slave-like conditions together with other Aboriginal groups such as the Bilinarra and Ngarinyman. In 1966 the Gurindji initiated a workers’ strike to protest against the poor working and living conditions and to ultimately regain control of their traditional lands. Today the Gurindji continue to live on their traditional lands at Kalkaringi (Charola and Meakins 2016).

The linguistic practices of the Gurindji are closely tied to these social circumstances. The establishment of the cattle stations by colonisers saw the introduction of the cattle station pidgin (the basis of Kriol) into the linguistic repertoire of the Gurindji. Code-switching was a common practice and it is likely that it provided a fertile ground for the formation of the mixed language (McConvell and Meakins, 2005; Meakins, 2011, 2012). The shift to a mixed language rather than monolingual Kriol was probably the result of the fact that Kalkaringi had only one dominant language (with other languages present such as Bilinarra and Ngarinyman mostly mutually intelligible) rather than many disparate languages spoken in one community that is a characteristic of Kriol-speaking communities (see below). English has had little foothold in the community, perhaps due to its late introduction. It is not entirely clear when a school was established in Kalkaringi but probably not before the 1960s. Most access to English before then was in the limited communication Gurindji people had with station people who, in any case, mostly addressed Gurindji people using the cattle pidgin.

1.2 Kriol

Kriol is an English-lexifier creole language and the first language of most Aboriginal people across the Top End of Australia with the exception of northern Arnhem Land and the Daly River region (Munro, 2000; Sandefur and Harris, 1986). Kriol-speaking communities include Ngukurr (where Roper River Kriol, the variety discussed in this paper, originated), Beswick, Barunga, Bulman, Katherine, Timber Creek, Bulla and Amanbidji (Fig. 1). Kriol is now the main language of these communities, with traditional Australian languages rarely used except by the oldest generations. English is the second most widely used language in most of these places, although is only learnt when children enter school. Like Kalkaringi, all education and government services are provided in English.

Figure 1
Figure 1

Towns and Aboriginal communities in northern Australia

Citation: Journal of Language Contact 11, 1 (2018) ; 10.1163/19552629-01101003

Structurally, Kriol is an isolating language with little bound morphology, for example core arguments are differentiated using word order or marked by prepositions. Similarly, tense, mood, and aspect (tam) categories are expressed through auxiliary verbs rather than inflections (Sandefur, 1979). The lexicon of Kriol is almost entirely derived from English, with a small amount of vocabulary maintained from surrounding substrate languages, in particular Marra (Dickson, 2016). Some examples are given below.


Kriol originated in nsw Pidgin and spread north to Queensland and the Northern Territory in the early 1900s through the pastoral industry (via Aboriginal labour imported from Queensland) and nativised in different places (Meakins, 2014; Sandefur and Harris, 1986; Simpson, 2000 for an overview). One of the earliest varieties of this cattle station pidgin to nativise was Roper River Kriol at Roper River Mission (now Ngukurr) in the early 1900s. Roper River Mission was established as a refuge for Aboriginal people from nine different language groups including Alawa, Marra, Warndarrang, Ngalakgan, and Ngandi who were escaping massacres. Most Aboriginal people were fluent in two or more of these languages. In addition, they would have spoken the pidgin English that arose from interaction with the colonists at least 30 years prior to the establishment of the mission. For many Aboriginal people at the mission, the cattle station pidgin became their lingua franca, with traditional languages reserved for in-group communication. The mission also separated children from their parents so a combination of community-level multilingualism and lack of access to traditional languages most likely contributed to the formation of Kriol (rather than a mixed language, as was the case for Gurindji Kriol). The presence of English was also strongly felt in the mission with children taught in English right from its establishment in the early 1900s (Harris, 1986). In this respect, Ngukurr is a community that has around 50 years and two generations more contact with English than Kalkaringi where Gurindji Kriol developed.

1.3 Stop Consonants

It has been shown that listeners weight relevant cues encoded in the speech stream to identify contrasts (Lisker, 1986; Scobbie, 1988). Some cues are given priority over others and experiments involving the removal of specific cues (e.g., vowel duration vs. spectral cues in English /i/ vs. /ɪ/ (Escudero, 2000)) can reveal the importance or weight of such cues. Escudero (2000) reveals that spectral cues in the tense/lax high front vowel pair in English take priority over duration.

When languages have a distinction between stop consonants in the same place of articulation, one of the primary cues used to distinguish such categories involves voice onset time (vot). This cue refers to the temporal duration from the moment of release of the closure to the onset of voicing in the following vowel (Lisker and Abramson, 1964). When a stop series is contrastive, it often conforms to one of three patterns: voiced, voiceless unaspirated, and aspirated (Keating, 1984). While the differences in duration are language specific, voiceless aspirated stops ([pʰ, tʰ, kʰ]), like those found in word-initial position in English, Australian Kriol, and Gurindji Kriol, are shown to have overall longer durations compared to voiceless unaspirated stops ([p, t, k]). The vot of stop consonants, like those found in French dialects (Caramazza and Yeni-Komshian, 1974; Hoonhorst et al., 2009), can also be negative, meaning vocal fold vibration begins before release. English contrasts between aspirated and unaspirated stops and speakers interpret the latter, both phonemically and orthographically, as <b, d, g> although they are not true voiced stops in the since voicing begins post-release.

Other secondary cues thought to be involved in stop production and perception include pitch (F0) depression after voiced stops (Abramson and Lisker, 1973). This can be observed as a decrease in the fundamental frequency right after release. Another secondary cue involves the loss of the initial transition of the first formant (F1) in vowels following a voiceless stop (known as F1 cutback) (Liberman, Delattre, and Cooper, 1958; Lisker and Abramson, 1964). The duration of the post-stop vowel has also been shown to correlate with stop voicing contrasts (Miller and Dexter, 1988; Summerfield, 1981).

1.4 Stop Production under Contact

While a substantial number of studies investigate the effects of bilingualism on vot values compared to those of monolinguals (MacLeod and Stoel-Gammon, 2005 for French-English; Delano, 2012 for Spanish-Creole English; Flege, 1991 for Spanish-English; Kehoe, Lleó, and Rakow, 2004 for Spanish-German; inter alia), studies that examine sound production in lexical borrowings in monolingual speech are only now emerging. Those described here all come from the mixed language or Kriol literature and all suggest that phonology, like the lexicon and grammar of a language, also does not conform to any clear systematic paradigmatic patterns in situations of borrowing but rather variation is commonplace, perhaps as an intermediate step in the development of a system.

Specifically related to this study, Jones and Meakins (2013) look at vot production in Gurindji Kriol and Northern Australia English. Unlike English, traditional Gurindji does not have a voicing contrast in the stop series, which consists of [p, t, c, k]. One particularly relevant finding to this study describes vot variation in Kriol-derived and Gurindji-derived words produced by adult speakers of Gurindji Kriol. Here, they tested whether the values systematically relate to those in English cognates. Based on data gathered from a picture naming task and natural speech, their results show that there is little effect of English voicing in Gurindji Kriol among words of Kriol or Gurindji origin in word-initial position, although there is some degree of variability (Jones and Meakins, 2013: 216).

These findings raise the questions: How are stops categorically perceived in Gurindji Kriol and is there any variation based on age or exposure to Australian English? And how do their results compare with those of Kriol? Based on impressionistic data, Kriol has been described as not having a stop voicing contrast, at least not in basilectal varieties, in existing published literature (Hudson, 1985; Munro, 2004; Sandefur, 1979) as well as in recent surveys (Butcher, 2008; Schultze-Berndt, Meakins, and Angelo, 2013). However, Bundgaard-Nielsen and Baker (2016) and Baker, Bundgaard-Nielsen, and Graetzer (2014), show that second and third generations of monolingual Roper Kriol speakers both produce and perceive stop-voicing contrasts ([p-b, t-d, k-g]) while first generation speakers show variability. For Gurindji Kriol, Jones and Meakins (2013) show that Gurindji Kriol speakers tend to assimilate any form of stop voicing perceptually to that of Gurindji’s phonological system though there is some degree of variation. What makes this situation worthy of further investigation, however, is the fact that variation between the voiced and voiceless series shows that speakers are at least able to make the correct articulatory gestures needed to produce such sounds. This means speakers might be able to take advantage of such variability perceptually when needed (e.g., under ambiguous conditions such as contrasting minimal pairs out of context e.g., boring/poring in the phrase Nyantu-ma i bin tok im rili poring/boring ‘She said it’s really pouring/boring’).

Because Kriol, as it is spoken at Ngukurr, developed earlier than Gurindji Kriol and has been in contact with English (which has a clear stop voicing contrast) for longer and more intensively through an extended period of schooling, we might expect Kriol listener perception to be more contrastive than their Gurindji Kriol counterparts. Through constant modern day contact with English, however, both languages may be adopting the stop voicing contrast—Kriol in all parts of speech and the Kriol origin lexicon in Gurindji-Kriol (e.g., pak and bak from English ‘park’ and ‘bark’ may be perceived as distinct instead of both defaulting to homonym pak). If the adoption process was merely for sociolinguistic reasons, we would expect a quicker diffusion of the contrast as speakers would be made consciously aware of the difference. However, an incremental and variable change may signify the structure of the language is benefiting from adopting the contrast (e.g., reducing functional load of the voiceless series that might level out phoneme frequency and distribution allowing for a greater number of contrasts leading to greater phonological optimization (Surendran and Niyogi, 2006; Wedel, Kaplan, and Jackson, 2013)). Regarding perception, there are four primary outcomes that will reveal how stop consonants are categorized in the phonology of these languages: (1) the voiced series assimilates to the voiceless series, (2) both series are perceptually contrastive, (3) both series exist in free variation, and (4) the voiceless series is established while the voiced series is in flux.

Bundgaard-Nielsen and Baker (2016) show that for elicited stops from three Roper Kriol speakers, there is a clear contrast between voiced and voiceless stop production in the English origin lexicon. 2 For spontaneous speech data from a single speaker, there also appears to be a contrast, though their results are non-significant; a result they claim is due to the small number of tokens. Moreover, they also show variability in stop voicing production in a Kriol dominant Wubuy L1 speaker that suggests Wubuy speakers make use of a single stop category regarding voicing. With respect to perception, Bundgaard-Nielsen and Baker (2015) showed that Wubuy listeners had a difficult time discriminating between both English and Kriol labial stops that differed in vot duration; a result they attribute to the lack of native experience in dealing with the voicing contrast. The adoption of the stop voicing contrast by Kriol speakers might be expected before that in Gurindji Kriol since the functional load of the contrast would affect the entire Kriol lexicon rather than just the Gurindji Kriol verb phrase elements.

In Media Lengua, a lexicon-grammar (lg) mixed language (Matras and Bakker, 2003; Meakins, 2013b) spoken in Ecuador, with Imbabura Quichua systemic elements and an Ecuadorian Rural Spanish-derived lexicon, Stewart (2015) showed the Spanish voiced stop series has been adopted, both productively and perceptually, by Quichua 3 and Media Lengua speakers with varying ages and levels of Spanish proficiency. The vot values of these adopted stops, however, are longer in duration than their original Spanish counterparts suggesting some degree of overshoot during acquisition. For the Quichua speakers, a significant number of stops also undergo variable weakening to [β, ð, ɣ]. Stewart (forthcoming, 2014) also claims a similar tendency for Spanish-derived vowels in both Quichua and Media Lengua.

Based on the differences in formation between these two mixed languages (code-switching in Gurindji Kriol (McConvell and Meakins, 2005)) versus. relexification in Media Lengua (Muysken, 1981) and the type of splits (50/50 Gurindji and Kriol lexicon in Gurindji Kriol (Meakins, 2011: 11)) versus 10/90 Quichua and Spanish lexicon respectively in Media Lengua (Muysken, 1997), the amount of ‘weight’ placed on the phonological system in Media Lengua by Spanish may have been large enough to warrant adopting the series; while this might not have been the case in Gurindji Kriol. To illustrate this point, in Michif, which, like Gurindji Kriol, is a (V)erb-(N)oun mixed language (Bakker, 2003: 122; Meakins, 2013a: 179), with Cree-derived verb phrases and French-derived noun phrases, Rosen, Stewart, and Cox (2016) show that speakers have actually only adopted a small number of French vowels while the rest assimilate to their Cree counterparts.

It should be noted that there have been attempts to systematically categorize these phonological processes. Van Gijn (2009) provided an in-depth analysis suggesting that mixed languages borrow phonological material based on type of lexical and grammatical material they adopt. Here, a language with a lexical-grammar split, where the lexicon of one language and the grammar from another combine to make a new language (e.g., Media Lengua, Ma’a), should share lower level material such as individual segments since phrases are more likely to be made of individual linguistic parts from each language. On the other hand, noun-verb mixed languages, which borrow lexical items categorically (e.g., Gurindji-Kriol, Michif) should maintain language-specific phonological material at levels higher than the segment since entire phrases may be of a single source language. Recent studies referenced above that explore the phonetic properties of these sound systems, however, paint a more complex picture involving mergers, near-mergers, segments with substantial overlap in acoustic space, and category maintenance. While some of these patterns align with Van Gijn’s (2009) analysis, the degree of alignment can seem peculiar (e.g., vowel spaces with such a high degree of overlap that they would seem to have little perceptual benefits to listeners). Other patterns (e.g., the number of actual French vowels borrowed in Michif), do not align with Van Gijn’s hypotheses.

Turning briefly to the bilingual literature, Pasquale (2005) revealed that when speaking Quechua, Quechua-Spanish bilinguals dominant in Quechua produced overall shorter vot values than Quechua monolinguals; values that trended towards Spanish-like production. Spanish-dominant bilinguals, on the other hand, showed no noticeable shift toward Quechua-like vot production when speaking Quechua. MacLeod and Stoel-Gammon (2005) suggest simultaneous French-English bilinguals produce vot with French monolingual-like values, which also carried over into their English vot production. Flege (1991) shows that Spanish-English late bilinguals produced the vot values of /t/ in between those of standard monolingual Spanish unaspirated values and monolingual English aspirated values. On the other hand, early bilinguals (Spanish L1, English L2) produced vot values that matched those of English monolinguals. These findings suggest that, for the most part, simultaneous and early bilinguals typically maintain separate L1 and L2 vot values while late bilinguals usually do not reach native-like vot production in their L2. Similarly, Chang, Yao, Haynes, and Rhodes (2011) show that the younger a heritage speaker is when exposed to both languages, the more successful they will be at maintaining distinctions within and across their languages.

Beyond this clear effect of age of acquisition, studies also show that language exposure (use, length of residence, practise etc.) is also a relevant factor in sound production and perception. Flege, Takagi, and Mann (1996) show Japanese speakers living in the us for 21+ years were able to identify liquids with higher consistency compared to Japanese speakers who only lived there for 2 years. About 10% of the improvement in the production of the English [e͜ɪ] diphthong by Italian speakers (with native [e]) could be attributed to the frequency of a speaker’s L2 usage; suggesting that practise can improve production in adult speakers (Flege, Schirru, and MacKay, 2003). At the same time, Flege and Liu (2001) conclude that for adults, length of residency is not enough to improve L2 performance. Instead, improvement is only measurable if a speaker receives constant input from L1 speakers. Finally, Klein (2013) shows that French and Mandarin L1 speakers with a substantial length of residency in an English-speaking area tend to produce more native-like English voiced stops. These findings might be applied to the Gurindji Kriol and Kriol context as a way to understand the roles of age and length of exposure to English in the formation of these languages.

1.5 Categorical Perception

Since Liberman (1957) researchers have been aware that humans (and later other animals (Kuhl and Iverson, 1995; Kuhl and Miller, 1979)) perceive individual speech sounds as homophonous-like categories, meaning distinct sounds within a single category are perceived as similar while neighbouring sounds in a separate category are perceived as distinct—even if cross-category sounds are closer in acoustic space. For bilingual listeners, however, the categorization of phonemes is more complex and varies based on age of acquisition of the L2. It is often thought that listeners establish phonemic categories within the first year of life (Kuhl, 2004; Werker and Tees, 1984), yet the organization of such categories for bilinguals has been shown to be distinct from their monolingual counterparts. Caramazza, Yeni-Komshian, Zurif, and Carbone (1973) show that for simultaneous and early bilinguals, a single intermediate boundary in vot perception was established for both a listener’s L1 and L2. Bosch, Costa, and Nuria (1997) however, show that L1 phonemic categories of early bilinguals remain essentially unchanged even when exposed to similar categories in the L2. On the other hand, Hazan and Barrett (2000) suggest the refinement of phonemic categories can take place until adolescence. Furthermore, Guion (2003) established that simultaneous bilinguals maintain separate categories even when faced by sounds that have the same phonemic function and articulatory shape across both languages (e.g., Spanish /i/ and Quichua /i/). For early bilinguals, however, these sounds merged while late bilinguals, who typically acquired Spanish under ‘unguided’ conditions, also merged the Spanish mid-vowels with Quichua high vowels (Quichua being a three vowel system consisting of /i, u, a/, Spanish consisting of /i, u, e, o, a/). Although these studies may differ as to when categories become solidified, it is clear that simultaneous and early bilinguals have distinct categorical arrangements compared to late bilinguals who rely on their L1 for perceptual cues in both their languages.

When investigating categorical perception of speech sounds two common task-based experiments are often implemented—identification-based and discrimination-based. The first involves identifying sounds as belonging to a given category—often presented in a forced choice format. In such an identification task, modified audio tokens along a continuum between two canonical phonemes might be presented at random and participants would be asked to label the audio stimuli by selecting a corresponding image/text or with a gestural/oral response. Two-alternative forced choice experiments, similar to the one presented in this paper, are considered advantageous for identifying categories for several reasons: (1) they are considered simple tasks for participants to complete, (2) they minimize bias as participants are only given two options for identification where one is known to be correct, (3) the need for distributional assumptions is typically not necessary (McGuire, 2010), and (4) according to Borden, Harris, and Raphael (1994) categorical boundaries can be estimated if the stimuli are contrastive. Pitfalls to this experimental method involve asking participants to sit through a lengthy experiment with a large number of trials that may become monotonous. The stimuli also need to be explicitly defined for the participants that may require a brief training session.

While not used in this study, it is worth briefly mentioning discrimination task-based experiments that are also often implemented for labelling categories perceptually (see e.g., Bundgaard-Nielsen and Baker, 2015). One common construction involves an ax design where participants are asked to label two audio samples as same or different. The benefits of such a discrimination task experiment of this nature involve (1) a smaller number of trials compared to identification tasks and (2) the ability to pin point categorical boundaries. Two main disadvantages of this experiment type involve a bias towards the same response when pairs are more difficult to contrast. The second involves a substantial amount of trial rejections with the same response due to their uninterpretability (McGuire, 2010).

2 Methodology

This section details our two-alternative forced-choice (2afc) identification task experiments (Section 2.1) including the stimuli used, how the continua were designed (Section 2.1.1), and presented (Section 2.1.2). Moreover, this section provides demographic information provided by the participants (Section 2.2) and the procedures used to implement the experiments (Section 2.3).

2.1 2afc Identification Tasks

The primary 2afc identification task used in this experiment was designed to look for stop voicing contrasts based on the intuitions of native speakers of both Gurindji Kriol and the Roper dialect of Kriol. This task-based experiment made use of Kriol lexical borrowings that also make up part of the Gurindji Kriol lexicon. In addition, a simplified version of the experiment was designed to test the intuitions (devised from a listener’s native experience with their L1 phonology) of Gurindji Kriol and Kriol speaking children typically younger than 8 years of age and Gurindji and Kriol older adults typically over the age of 65 (see Section 2.2 for further details). The goal of this experiment was not to seek out categorical boundaries, but rather to simply learn whether listeners in these age ranges could identify voicing contrasts in word-initial stops. Going forward, we refer to these 2afc identification task experiments as the ‘standard experiment’ (for preteens to middle-aged adult participants) and the ‘simplified experiment’ for participants with limited exposure to Standard English (young children under the age of approximately 8 and older adults over the age of approximately 65).

2.1.1 Stimuli

To gather stop perception data for our standard experiment, we used seven word-initial minimal pairs that contrasted in stop voicing quality in word initial position in Kriol (e.g., pak ‘park’ and bak ‘bark’). Each minimal pair has its origins in Kriol and has made its way to the Gurindji Kriol lexicon where it maintains a nearly identical phonological shape. Table 1 presents the stimuli used in our word identification task. Both the [t-d] and [k-g] series contain two minimal pairs each, while the [p-b] series contains three. For the simplified experiment, the minimal pairs: traiyimat-draiyimat and katim-gatim, were removed.


Instead of using synthetic audio tokens for the stimuli, we chose to modify natural speech tokens to minimize issues with quality that have been attributed to synthetic speech (Vainio, Järvikivi, Werner, Volk, and Välikangas, 2002). For both experiments, one of the authors, a female speaker of Roper River Kriol from Ngukurr (a Kriol-English bilingual) with a clear stop voicing contrast produced the minimal pairs in Table 1. An Editor R09 portable digital recorder with a Sony lapel mic (40–20,000 Hz response) was used to record the stimuli with a sample rate of 44.1 kHz. After rendering the recordings to 16-bit stereo wav format, we manually modified several primary and secondary acoustic cues known to carry weight in stop perception. These included, the voice onset time of each word-initial stop, formant transitions at the onset of voicing, and the overall pitch and duration of the following vowel. The removal of aspiration during the modification of the vot took place immediately following the release burst to insure consistency in the resulting token stimuli. We then combined any remaining portion of the original voiced minimal pair token to create a more naturalistic sound sample. Each continuum was tested for naturalness before it was integrated into the experiment.

For the standard experiment, we chose to modify the sound tokens along a 10-step continuum that transitioned gradually from the word-initial voiceless stop minimal pair to its word-initial voiced stop counterpart to cover a reasonably large range of token samples. As the modified values become more distant from their prototypical form along the continuum, we hypothesized that if a participant does indeed contrast the minimal pairs, identification consistency would be reduced. On the other hand, if a participant perceived the minimal pairs as the same, we would expect random responses throughout the continua. If, however, only one token appears phonemically in a participant’s inventory, we hypothesize she/he will accurately identify tokens at one end of the continuum while at the other responses would be more randomized. If there is indeed a voicing contrast found, using the 10-step continua will allow is to hone in on the categorical boundaries of each minimal pair.

For the simplified experiment, we opted to do away with the continua and only use the canonical tokens of each minimal pair (i.e., those used at step 1 and step 10 in the standard experiment). For the standard experiment, all the aforementioned modified values were evenly spaced along the continuum as per the original values from the voiceless and voiced stop tokens. All modifications took place using the open source programs Praat version 6.0.8 (Boersma and Weenink, 1996) and Audacity 2.1.0 (Ash, Chinen, Dannenberg, Johnson, and Martyn, 2012). Praat scripts to help automate portions of the token modification process were written by the authors. A sample of the resulting values for the poring-boring ‘pouring-boring’ continuum are provided in Table 2. The poring-boring values of the simplified experiment are the same as those in step 1 and step 10 in Table 2.


2.1.2 Presentation

In this section we describe the user interface of the experiment. For the standard experiment, the 10 tokens of each minimal pair, described in Section 2.1.1, were placed in a Microsoft PowerPoint presentation along with images corresponding to each minimal pair (Fig. 2).

Figure 2
Figure 2

Three slides from the identification task. The top left slide shows the minimal pairs bak ‘bark’ and pak ‘park’. The middle slide shows the minimal pairs kol ‘coal’ and gol ‘gold’. The bottom right slide shows the minimal pairs pai ‘pie’ and bai ‘baby sleep’.

Citation: Journal of Language Contact 11, 1 (2018) ; 10.1163/19552629-01101003

To attain more accurate results, we designed the presentation for the standard experiment to have more repeats of more distant stimuli from the canonical forms. The resulting vot production values from Jones and Meakins (2013) provided an additional basis for determining that tokens should be repeated (Table 3). Therefore, the participants listened to the same minimal pair series along the continuum 16 to 17 times for a total of 115 token samples when considering all seven minimal pairs. Table 3 provides an example of the repeated tokens along the continuum.


For both experiments, we configured the PowerPoint presentation to play each token 50 ms after each new slide appeared on the screen. Participants were given the option to repeat the audio sample if they so desired. The participants could also take as much time as they saw fit to respond to the stimuli as there was no form of time pressure. The presentation was configured to use ‘Kiosk’ mode, which restricted where the participant could click on the screen to move to the following slide. By doing so, the participant had to click one of the two images that avoided accidental clicks on the surrounding areas that would otherwise not record their response. Each image was scripted using the Visual Basic for Applications (vba) add-on in PowerPoint to record the participant’s individual response for each slide. To avoid any type of pattern recognition in the data the slides were reordered using a randomization macro. The slides were then further adjusted to make sure no two contained the same images one after the other. At the beginning of the experiment two trail tokens were presented to introduce the participants to the experiment. Finally, for the standard experiment, one slide containing an audio sample from step 10 was placed at the beginning of the presentation. This provided the participants with a canonical form to get their bearings before being presented with non-canonical forms at random and all the participants heard the stimuli in the same randomised order. For both experiments, distractor tokens involving stop-fricative minimal pairs, produce by the same speaker, were added to the experiment to reduce the constant repetition of the same seven minimal pairs. All instructions were given in the participants’ L1 (Gurindji Kriol and Kriol respectively). At the end of the experiment a text file was created containing all the participant’s responses and demographic information collected on the first slide.

2.2 Participants