Chapter 4 Topics and Subjects in German Newspaper Editorials: A Corpus Study

In: Information Structuring in Discourse
Peter Bourgonje
Manfred Stede
We discuss an approach to identifying aboutness topics in authentic text, viz. a corpus of German newspaper editorials. On the basis of dedicated annotation guidelines, 176 texts have been labelled with topics, and this layer has been added to the corpus, which we made available in the search engine ANNIS3. Since syntactic annotation is also available, aboutness topics can now be correlated with grammatical subjects, an element that is generally taken to often coincide with topic. As expected, the two notions mostly coincide, but we found 26 % of cases where topics are not subjects. In a qualitative analysis, we suggest a classification of the reasons for these mismatches. Furthermore, we present experiments on automatically classifying aboutness topics given various features of the noun phrases and their context, and we find that it is difficult to beat a simple “subject implies topic”, but we do so by 2 points in the F1-measure.

