Words (don’t come easy): The Automatic Retrieval and Analysis of Popular Song Lyrics

In: From Data to Evidence in English Language Research
Authors: David Brett and Antonio Pinna

A text type that has been by and large ignored by mainstream corpus linguistics research until recently is that of the lyrics of popular songs. Three recent works, by , and , are ground-breaking studies. However, they are based on relatively small samples. The current work will describe the compilation of a large (10 million tokens) corpus of popular song lyrics in English divided into sub-genres: the Sassari Lyrics (SLY) Corpus. The texts were gathered by web crawling the index pages of an online song repository. We will then analyze the keywords of each sub-genre and shared keywords, highlighting similarities and differences between sub-genres. The first part of this paper will discuss the procedures adopted to retrieve the song lyrics, along with metadata such as date, author, album and sub-genre. The repository proved somewhat unreliable regarding the attribution of artists to musical sub-genres, therefore alternative semi-automatic processes had to be developed. Several other reliability issues will be discussed, for example, songs in foreign languages, covers, variation in song titles and artist names are all factors that had to be filtered out or normalized. The second part will present preliminary results concerning the analysis of keywords. While each sub-genre (ALTERNATIVE ROCK, COUNTRY, HIP HOP, HEAVY METAL, POP, R&B and ROCK) had a considerable number of keywords, we noticed that those of some sub-genres, such as HIP HOP and HEAVY METAL, were highly characteristic lexical items, those of others, such as POP and R&B were mainly grammatical items with very high frequencies. The latter two sub-genres share so many keywords that it could be argued that, at least on a textual basis, they are essentially not discernible.

Metrics

All Time Past Year Past 30 Days
Abstract Views 175 114 7
Full Text Views 11 2 0
PDF Downloads 2 1 0