Academic Vocabulary in Wikipedia Articles: Frequency and Dispersion in Uneven Datasets

In: From Data to Evidence in English Language Research

Abstract

Despite its popularity, the status of Wikipedia in higher education settings remains somewhat controversial, and the linguistic characteristics of the genre have not been exhaustively described. This exploratory paper takes a data-driven approach to assessing the use of academic vocabulary in Wikipedia articles. Our analysis is based on Coxhead’s Academic Word List, and the data comes from the Westbury Lab Wikipedia Corpus. We employ methods of statistical data analysis to classify Wikipedia articles according to the frequencies of academic words, and apply the same procedure to a comparable set of texts representing another genre, published research articles. The unsupervised classification procedure groups the articles according to academic content regardless of topic, which allows us to measure genre-specific similarities. The findings of the study show that academic words are common in both genres in focus, and more interestingly, if we look at aggregate frequencies of academic words, Wikipedia articles are not markedly different from RAs within the same discipline. This being said, we can observe disciplinary differences in the distribution of academic words in Wikipedia, such that Economics writing contains more academic words than the other two disciplines in focus. Disciplinary differences can likewise be observed in the distribution of individual academic words.

Metrics

All Time Past Year Past 30 Days
Abstract Views 97 64 1
Full Text Views 12 5 0
PDF Downloads 2 0 0