The retrieval of false anglicisms in newspaper texts

In: Corpus Linguistics 25 Years on

Purchase instant access (PDF download and unlimited online access):



The present article is the description of a project aimed at building a specialized corpus of Italian newspaper texts and at developing a computational technique to retrieve new false anglicisms from it. Texts were collected along a ten-month span from three Italian newspapers: La Stampa, La Repubblica, and Il Corriere della Sera. The size of the corpus is about 20 million tokens and approximately 230,000 types. The system was automatically updated on a daily basis and a list of words was obtained at the end of the collection period. This procedure originated a refined word list in which false anglicisms were searched. Along with computational techniques, careful manual scanning proved to be indispensable to extract new false anglicisms. The corpus is available for future work and may be exploited not only to find false anglicisms but also to retrieve anglicisms, neologisms, and to analyse lexical features of Italian newspaper language.


All Time Past Year Past 30 Days
Abstract Views 242 45 4
Full Text Views 113 2 0
PDF Views & Downloads 10 2 0