Invariant tags, such as huh and innit, are discourse markers that often occur at the end of an utterance to provide attitudinal and/or evidential information above that of the proposition. Many previous studies examined the meaning or usage of these tags in single varieties or dialects of English. Few of these studies, however, have examined variation in invariant tag use. Some studies have investigated sociolinguistic divisions within a dialect, but none have compared usage between varieties. Furthermore, differences in research methodology and aims prevent comparison of the prior results. This study investigates the meaning/functions of four invariant tags—eh, yeah, no, and na—in New Zealand, Indian, and British English. The four most frequent meanings are described in detail. The results show differences in the meanings available as well as in their usage frequencies across both items and varieties. This suggests that varietal differences at the level above propositional understanding could cause problems for intercultural and global communication. This has implications for pedagogy and materials for English for Speakers of Other Languages (ESOL) and English for Specific/Business Purposes, in that global communication in English requires an awareness of these subtle differences at the varietal level.
This paper describes a grammatically motivated system for extracting opinionated text. A technique for extracting appraisal expressions has been described in previous work, using manually constructed syntactic linkages to locate targets of the opinions. The system extracts attitudes using a general lexicon—and some candidate targets using a domain specific lexicon—and finds additional targets using the syntactic linkages. In this paper, we discuss a technique for automatically learning the syntactic linkages from a list of all extracted attitudes and the list of candidate targets. The accuracy of the new learned linkages is comparable to the accuracy of the old manual linkages.
Experimental laboratory results, often performed with college student subjects, have proposed several linguistic phenomena as indicative of speaker deception. We have identified a subset of these phenomena that can be formalized as a linguistic model. The model incorporates three classes of language-based deception cues: (1) linguistic devices used to avoid making a direct statement of fact, for example, hedges; (2) preference for negative expressions in word choice, syntactic structure, and semantics; (3) inconsistencies with respect to verb and noun forms, for example, verb tense changes. The question our research addresses is whether the cues we have adapted from laboratory studies will recognize deception in real-world statements by suspects and witnesses.
The issue addressed here is how to test the accuracy of these linguistic cues with respect to identifying deception. To perform the test, we assembled a corpus of criminal statements, police interrogations, and civil testimony that we annotated in two distinct ways, first for language-based deception cues and second for verification of the claims made in the narrative data. The paper discusses the possible methods for building a corpus to test the deception cue hypothesis, the linguistic phenomena associated with deception, and the issues involved in assembling a forensic corpus.
The study examines change and variation in the system of English predicate complementation in recent times on the basis of three major corpora, the Corpus of English Novels (the CEN), comprising some 18 million words, the full Bank of English Corpus, comprising some 524 million words, and the Corpus of Contemporary American English, comprising some 360 million words in the version used, with the focus on the matrix predicate submit and its sentential complements. The sentential complements in question are of the to infinitive and to -ing types. It is shown that there are sharp grammatical differences between these two types of complement in English. However, submit is a matrix verb that has selected both types in recent English. In the CEN, representing usage from about a century ago, to infinitive complements predominated over to -ing complements by a ratio of over five to one, but in current English, to -ing complements predominate over to infinitives by a ratio of almost two to one. The study examines the nature of the variation and change, and it is pointed out that, in spite of the sharp grammatical differences between the two patterns, in the CEN both types of complement were found in one and the same text by one author, without any apparent distinction relating to genre, register, or style between the two variants. A difference in meaning is also hard to establish on the basis of contrasts generally posited in the literature for to infinitives and -ing forms. However, the study shows that passive and passive-like lower predicates with lower subjects of the type Patient or Undergoer are often associated with submit, and it is suggested that this association may have promoted the grammatical change from to infinitives to to -ing complements in the present case, in the overall context of what has come to be called the Great Complement Shift in the recent literature. The -ing pattern is virtually unique to English as a pattern of nonfinite complementation, and the grammatical change in question, it is suggested, is an example of system-internal change. The present study suggests that it is desirable to conduct follow-up work on the nature of the semantic roles of lower subjects as a factor bearing on grammatical change in the case of predicates selecting nonfinite sentential complements in recent centuries.
In order to adjust observed frequencies of occurrence, previous studies have suggested a variety of measures of dispersion and adjusted frequencies. In a previous study, I reviewed many of these measures and suggested an alternative measure, DP (for ‘deviation of proportions’), which I argued to be conceptually simpler and more versatile than many competing measures. However, despite the relevance of dispersion for virtually all corpus-linguistic work, it is still a very much under-researched topic: to the best of my knowledge, there is not a single study investigating how different measures compare to each other when applied to large datasets, nor is there any work that attempts to determine how different measures match up with the kind of psycholinguistic data that dispersions and adjusted frequencies are supposed to represent. This article takes exploratory steps in both of these directions.
We report on a project investigating the linguistic properties of English scientific texts on the basis of a corpus of journal articles from nine academic disciplines. The goal of the project is to gain insights on registers emerging at the boundaries of computer science and some other discipline (e.g., bioinformatics, computational linguistics, computational engineering). The questions we focus on in this paper are (a) how characteristic is the corpus of the meta-register it represents, and (b) how different/similar are the subcorpora in terms of the more specific registers they instantiate? We analyze the corpus using several data-mining techniques, including feature ranking, clustering, and classification, to see how the subcorpora group in terms of selected linguistic features. The results show that our corpus is well distinguished in terms of the meta-register of scientific writing; also, we find interesting distinctive features for the subcorpora as indicators of register diversification. Apart from presenting the results of our analyses, we will also reflect upon and assess the use of data mining for the tasks of corpus exploration and analysis.
Many researchers have found that some words or constructions tend to co-occur with words representing a positive or negative semantic nuance, demonstrating that these words have a certain semantic preference (e.g., the negative preference of cause in Stubbs, 1995). Other researchers have explored the negative or positive associations of words taken out of context, their semantic orientation (Osgood et al., 1957; Turney and Littman, 2003). In this paper, we investigate how well a word’s semantic orientation correlates with its semantic preference. We use the quantitative method developed by Dilts and Newman (2006) to measure how strongly a large number of nouns in the British National Corpus prefer to collocate with positive or negative orientation adjectives. We then compare each noun’s semantic preference to its rating for ‘pleasure’ from the Affective Norms for English Words (ANEW) (Bradley and Lang, 1999), an established psychological measure of a word’s semantic orientation. We find a surprisingly large number of nouns with negative semantic orientation but positive semantic preference: that is, ‘bad’ nouns preferring to collocate with ‘good’ adjectives. By contrast, only a small number of ‘good’ nouns attracted more ‘bad’ adjectives. Our results suggest an interesting mismatch in the way nouns are modified: while ‘good’ nouns attract primarily positive adjectives (further reinforcing their semantic orientation), ‘bad’ nouns attract both negative (reinforcing) and positive (qualifying) adjectives that have a greater transformative effect on the semantics of the noun.
We used the method proposed in Kilgarriff (2001) to assess corpus similarity over a short period of time both within topic and cross topic. The corpus samples were drawn from a Portuguese journalistic corpus. The corpus spans eight years (from 1991 to 1998) and comprises article extracts marked with the year segment, half-year segment, and newspaper section of publication. We analyzed the corpus, taking as reference each text in the time interval and comparing it with all texts published in different periods. We observed that (i) the similarity between two texts within the same topic generally decreases as the time gap between them increases, being more significant for some topics, and (ii) in some cases the texts on one topic over time become as different as two texts from different topics. Since the ultimate goal of our work is to understand how the changes in corpus similarity affect the performance of a named entity tagger, we also measured similarity based on frequency lists containing only capitalized words and containing only lowercase words. The former similarity aims at comparing the corpora from the viewpoint of the named entities content, whereas the latter one approximately compares the surrounding contexts of the named entities. The results show that the similarity values based on these lists also generally decrease over time, even though the decreasing profiles are topicdependent.
The present study investigates the relationship between the discourse functions of lexical bundles found in classroom teaching and their position. Eighty-four lexical bundles, frequently occurring four-word combinations identified earlier in university classroom talk (Biber, Conrad, and Cortes, 2004), are tracked in the first six Vocabulary-Based Discourse Units (VBDUs) also identified previously (Biber, Csomay, Jones, and Keck, 2004) of 176 university lectures. Among others, expressions such as you might want to, I would like to, if you look at, and in the case of are traced in tandem with their previously identified classification of discourse functions. While earlier studies reported on the relationship between the bundles’ discourse functions and their position in the first three discourse units (Cortes and Csomay, 2007), there are no studies yet on how the frequency patterns may change in the second set of three discourse units.
The findings of this study show a sharp increase in the use of referential bundles and those discourse organizers with a topic elaboration that focuses in the second set of discourse units. At the same time, the use of bundles expressing stance, especially those referring to personal ability and personal intention and those discourse organizers with a topic introduction, drop in the second set of discourse units. These findings provide further, lexical evidence for the claim that a strong relationship exists between intra-textual linguistic variation and the corresponding shift in discourse functions in university classes (Csomay, 2005, 2007).