Corpus Linguistics and Linguistic Theory

Papers from the Twentieth International Conference on English Language Research on Computerized Corpora (ICAME 20) Freiburg im Breisgau 1999


Edited by Christian Mair and Marianne Hundt

From being the occupation of a marginal (and frequently marginalised) group of researchers, the linguistic analysis of machine-readable language corpora has moved to the mainstream of research on the English language. In this process an impressive body of results has accumulated which, over and above the intrinsic descriptive interest it holds for students of the English language, forces a major and systematic re-thinking of foundational issues in linguistic theory. Corpus linguistics and linguistic theory was accordingly chosen as the motto for the twentieth annual gathering of ICAME, the International Computer Archive of Modern/ Medieval English, which was hosted by the University of Freiburg (Germany) in 1999. The present volume, which presents selected papers from this conference, thus builds on previous successful work in the computer-aided description of English and at the same time represents an attempt at stock-taking and methodological reflection in a linguistic subdiscipline that has clearly come of age.
Contributions cover all levels of linguistic description - from phonology/ prosody, through grammar and semantics to discourse-analytical issues such as genre or gender-specific linguistic usage. They are united by a desire to further the dialogue between the corpus-linguistic community and researchers working in other traditions. Thereby, the atmosphere ranges from undisguised skepticism (as expressed by Noam Chomsky in an interview which is part of the opening contribution by Bas Aarts) to empirically substantiated optimism (as, for example, in Bernadette Vine's significantly titled contribution Getting things done).

Procedural and declarative information in software manuals

Effects on information use, performance and knowledge


Nicole Ummelen

People who use software manuals want to get something done. Procedural information directly supports this goal, but the use of declarative information in manuals has often been under discussion. Current research gives rise to the expectation that manual users tend to skip declarative information most of the time. Also, no effects of declarative information in software manuals have yet been found.
In this study, information use and information effects in software manuals are investigated in three experiments, thereby taking different user types, different task types and different information arrangements into account. A new technique was applied: the click&read method. This technique enables the software user to use the manual and carry out software tasks at the same time while information selection and times are recorded automatically in logfiles.
For the first time, quantitative data are presented about the amounts of procedural and declarative information that were selected and the times that were spent using these information types. Although procedural information is selected more often and used longer, declarative information appears to be a substantial part of the information selection. Moreover, the results show that using declarative information positively affects performance on future tasks, performance on reasoning tasks and factual knowledge.

Translating Sensitive Texts

Linguistic Aspects


Edited by Karl Simms

This volume brings together twenty-two of the world's leading translation and interpreting theorists, to address the issue of sensitivity in translation. Whether in novels or legal documents, the Bible or travel brochures, in translating ancient texts or providing simultaneous interpretation, sensitive subject-matter, contentious modes of expression and the sensibilities of the target audience are the biggest obstacles to acceptance of the translator's work. The contributors bring to bear a wide variety of approaches - generative, cognitive, lexical and functional - in confronting this problem, and in negotiating the competing claims of source cultures and target cultures in the areas of cultural, political, religious and sexual sensitivity. All of the articles are presented here for the first time, and in his Introduction Karl Simms gives an overview of the philosophical and linguistic questions which have motivated translators of sensitive texts through the ages. This book will be of interest to all working translators and interpreters, and to teachers of translation theory and practice.

Computational Linguistics in the Netherlands 1998

Selected Papers from the Ninth CLIN Meeting


Edited by Frank Van Eynde, Ineke Schuurman and Ness Schelkens

This volume provides a selection of the papers which were presented at the ninth conference on Computational Linguistics in the Netherlands (Leuven, 1998). It gives an accurate and up-to-date picture of the lively scene of computational linguistics in the Netherlands and Flanders. In terms of topics the contributions can be grouped under three headings: the use of statistical methods in speech and language processing (6 papers), the analysis of syntactic and semantic phenomena in the framework of computationally oriented formalisms, such as Head-driven Phrase Structure Grammar (5 papers), and the development of NLP applications, such as document processing, dialogue modelling and teaching (3 papers). The volume covers the whole range from theoretical to applied research and development, and is hence of interest to both academia and industry. The target audience consists of advanced students and scholars of computational linguistics, and speech and language processing (Linguistics, Computer Science, Electrical Engineering).

Out of Corpora

Studies in Honour of Stig Johansson


Edited by Hilde Hasselgård and Signe Oksefjell

Corpus Linguistics

Refinements and Reassessments


Edited by Antoinette Renouf and Andrew Kehoe

Throughout history, linguists and literary scholars have been impelled by curiosity about particular linguistic or literary phenomena to seek to observe them in action in original texts. The fruits of each earlier enquiry in turn nourish the desire to continue to acquire knowledge, through further observation of newer linguistic facts.
As time goes by, the corpus linguist operates increasingly in the awareness of what has gone before. Corpus Linguistics, thirty years on, is less an innocent sortie into corpus territory on the basis of a hunch than an informed, critical reassessment of existing analytical orthodoxy, in the light of new data coming on stream.
This volume comprises twenty-two articles penned by members of the ICAME (International Computer Archive of Modern and Mediaeval English) association, which together provide a critical and informed reappraisal of the facts, data, methods and tools of Corpus Linguistics which are available today. Authors reconsider the boundaries of the discipline, exploring its areas of commonality with Sociolinguistics, Language Variation, Discourse Linguistics, and Lexical Statistics and showing how that commonality is potentially of immense benefit to practitioners in the fields concerned.
The volume culminates in the report of a timely and novel expert panel discussion on the role of Corpus Linguistics in the study of English as a global language. This encompasses issues such as English as an international lingua franca, ‘norms’ for global English, and the question of ‘ownership’, or who qualifies as a native speaker.

Synchronic Corpus Linguistics

Papers from the sixteenth International Conference on English Language Research on Computerized Corpora (ICAME 16)


Edited by Carol E. Percy, Charles F. Meyer and Ian Lancashire

Synchronic corpus linguistics contains select papers from the sixteenth International Conference on English Language Research on Computerized Corpora (ICAME 16). The papers reflect the state of the art in the design, analysis, and annotation of corpora. Corpora new and old facilitate the description of single registers of English (e.g., London teenage English, business English) and of specific grammatical topics across registers (e.g., the grammatical flexibility of idioms), including variation studies (e.g., popular vs. technical registers of English). Other corpora permit the comparison of English to other languages (Norwegian, German, Swedish); of L1 English to L2 English; and of English as an original language to English in translation. A number of these papers emphasize pragmatics: indeed, among the papers on spoken English is an assessment of corpora annotated for discourse analysis. Other papers describe different aspects of the automatic analysis of text. Two papers describe semantic analysis of large text corpora composed of news/business text. Automatic grammatical analysis is the subject of other papers: two evaluate existing automatic parsers and wordclass taggers, while two describe how annotated corpora are being used to develop two new and innovative automatic parsers.


Edited by Marianne Hundt, Nadja Nesselhauf and Carolin Biewer

Using the Web as Corpus is one of the recent challenges for corpus linguistics. This volume presents a current state-of-the-arts discussion of the topic. The articles address practical problems such as suitable linguistic search tools for accessing the www, the question of register variation, or they probe into methods for culling data from the web. The book also offers a wide range of case studies, covering morphology, syntax, lexis, as well as synchronic and diachronic variation in English. These case studies make use of the two approaches to the www in corpus linguistics – web-as-corpus and web-for-corpus-building. The case studies demonstrate that web data can provide useful additional evidence for a broad range of research questions.

Corpus-Based Research into Language

In honour of Jan Aarts


Edited by Nelleke Oostdijk and Pieter de Haan

For over two decades Jan Aarts has been actively involved in corpus linguistic research. He was the instigator of a large number of projects, and he was responsible for what has become known as the Nijmegen approach to corpus linguistics. It is thanks to him that words like TOSCA and LDB have become household names in the corpus linguistic community.
The present volume has been collected in his honour. The contributions in it cover a wide range of topics in the field of corpus linguistic research, especially those in which Jan Aarts takes a keen interest: corpus encoding and tagging, parsing and databases, and the linguistic exploration of corpus data. The contributions in this volume discuss work done in this field outside Nijmegen, for the obvious reason that we do not wish to present him with a report on work in which he is himself involved.

Towards a Methodology for the Investigation of Norms in Audiovisual Translation

The Choice between Subtitling and Revoicing in Greece. Amsterdam


Fotios Karamitroglou

Here is presented for the first time a methodology for the investigation of norms which operate in the field of audiovisual translation. Based on the findings of the polysystem approach to translation, the present work aims to demonstrate that it is possible to investigate audiovisual translation and the norms that operate in it in a systematic way.
Human agents, (audiovisual) products, recipients, and the mode itself are thoroughly investigated and stratified under a lower, middle and upper level. Specific techniques for collecting and analysing data are suggested.
The model is tentatively applied to the investigation of norms which seem to determine the choice between subtitling and revoicing children's TV programmes in Greece. However, one will soon notice that the same model could be applied for the investigation of audiovisual translation norms in any other country. But not only that: one will quickly realise that, with minute modifications, the same model can prove effective for the study of norms in other modes of written translation too. Therefore, this volume can be of a high interest not only to audiovisual translation scholars and practitioners, but to general translation scholars and students of translation proper as well.