Search Results

You are looking at 1 - 6 of 6 items for

  • Author or Editor: John Nerbonne x
  • Search level: All x
Clear All
Authors: John Nerbonne and Tony Mullen

Abstract

In this paper we argue that certain nominal phrase constructions in German and English are best considered as having empty lexical heads. We propose a feature LP, which gives the status of the LEFf PERIPHERY of a nominal tree structure as one of three values, empty, full or one. A number of simple language-specific rules govern the combination of signs in terms of their LP values. For example, determiners such as none or mine are restricted to combining with Ñ constituents whose left periphery is empty [LP empty] while no and my require [LP full]. The feature provides a simple general explanation of a number of related phenomena wherein determiners or adjectives appear to “carry the weight” of DPs, including a variety of German DP constructions, certain possessive constructions in both English and German, and generics. The broad descriptive power of this feature argues that it is not an ad hoc solution. In order to justify it further, we investigate alternate explanations for the same phenomena, without using the LP feature, and argue that these approaches introduce unnecessary ambiguity and other complications.

In: Computational Linguistics in the Netherlands 1998

Abstract

We discuss experiments with neural networks being trained in a phonotactic processing task. A recurrent network not only learns to predict the next letter given a partial processed word, but also learns to represent the letters in a manner meaningful to the processing task. To this end, we use Miikk.ulainen’s (1993) FGREP, augmented with an algorithm we call dispersion, to improve distinctness among the set of letter representations.

Our goal is to create a more realistic model of how humans might process natural language.

In: Computational Linguistics in the Netherlands 1998

Abstract

Stoianov, Nerbonne and Bouma (1998) trained Simple Recurrent Networks (SRNs) on graphotactics of Dutch monosyllabic words, overcoming shortcomings of previous implementations. The current report is a continuation of our earlier research, but using phonetic data representations instead of orthographic, that is, learning phonotactics. In addition, we conducted further analysis of neural network performance with regard to some variables such as word frequency, length, neighbourhood density and error location. The results are compared with reported psycholinguistics analyses. This informal comparison of SRNs and human performance suggests that SRNs can be used for modeling natural language processing.

In: Computational Linguistics in the Netherlands 1998
Volume Editors: Dicky Gilbers, John Nerbonne, and Jos Schaeken
The present volume includes papers that were presented at the conference Languages in Contact at the University of Groningen (25-26 November 1999). The conference was held to celebrate the University of St. Petersburg’s award of an honorary doctorate to Tjeerd de Graaf of Groningen. In general, the issues discussed in the articles involve pidgins and creoles, minorities and their languages, Diaspora situations, Sprachbund phenomena, extralinguistic correlates of variety in contact situations, problems of endangered languages and the typology of these languages. Special attention is paid to contact phenomena between languages of the Russian Empire / USSR / Russian Federation, their survival and the influence of Russian.

We have documented language varieties (either Turkic or Indo-European) spoken in 23 test sites by 88 informants belonging to the major ethnic groups of Kyrgyzstan, Tajikistan and Uzbekistan (Karakalpaks, Kazakhs, Kyrgyz, Tajiks, Uzbeks, Yaghnobis). The recorded linguistic material concerns 176 words of the extended Swadesh list and will be made publically available with the publication of this paper. Phonological diversity is measured by the Levenshtein distance and displayed as a consensus bootstrap tree and as multidimensional scaling plots. Linguistic contact is measured as the number of borrowings, from one linguistic family into the other, according to a precision/recall analysis further validated by expert judgment. Concerning Turkic languages, the results of our sample do not support Kazakh and Karakalpak as distinct languages and indicate the existence of several separate Karakalpak varieties. Kyrgyz and Uzbek, on the other hand, appear quite homogeneous. Among the Indo-Iranian languages, the distinction between Tajik and Yaghnobi varieties is very clear-cut. More generally, the degree of borrowing is higher than average where language families are in contact in one of the many sorts of situations characterizing Central Asia: frequent bilingualism, shifting political boundaries, ethnic groups living outside the “mother” country.

In: Language Dynamics and Change

With an eye toward measuring the strength of foreign accents in American English, we evaluate the suitability of a modified version of the Levenshtein distance for comparing (the phonetic transcriptions of) accented pronunciations. Although this measure has been used successfully inter alia to study the differences among dialect pronunciations, it has not been applied to studying foreign accents. Here, we use it to compare the pronunciation of non-native English speakers to native American English speech. Our results indicate that the Levenshtein distance is a valid native-likeness measurement, as it correlates strongly (r = -0.81) with the average “native-like” judgments given by more than 1000 native American English raters.

In: Language Dynamics and Change