Search Results
Abstract
We discuss the annotation with part of speech and lemma of the Dutch PAROLE Internet Corpus.
The PAROLE PoS tagger is a combination of statistical taggers. It includes the Markov tagger TnT and 3 taggers developed at the INL with the purpose of using other information besides the training data. Lemma is assigned by a deterministic procedure, based on an extensive lexicon.
The output is in some respects not entirely satisfactory; we discuss what can be done about this without having to manually correct the complete corpus.
Abstract
We discuss the annotation with part of speech and lemma of the Dutch PAROLE Internet Corpus.
The PAROLE PoS tagger is a combination of statistical taggers. It includes the Markov tagger TnT and 3 taggers developed at the INL with the purpose of using other information besides the training data. Lemma is assigned by a deterministic procedure, based on an extensive lexicon.
The output is in some respects not entirely satisfactory; we discuss what can be done about this without having to manually correct the complete corpus.
Abstract
At the Instituut voor de Nederlandse Taal (Dutch Language Institute), DiaMaNT, a diachronic semantic computational lexicon of Dutch, is being developed, based on the scholarly historical dictionaries of Dutch. The main purpose of this lexicon is to enhance text accessibility and foster research in the development of concepts. This article explores the feasibility of enriching DiaMaNT with an existing semantic classification by linking a subset of the vocabulary of the Dictionary of Old Dutch to A Thesaurus of Old English.