A Dutch Chunker as a Basis for the Extraction of Linguistic Knowledge

in Computational Linguistics in the Netherlands 2002
Restricted Access
Get Access to Full Text

Subject Highlights



We have developed a fully automatic recursive chunker for unrestricted Dutch text to be used as a basis for the extraction of linguistic and terminological information. The chunker is based on the approach adopted for the analysis of German in the YAC-chunker. Our tool builds up flat annotations of (maximal) syntactic constituents, using a multi-pass algorithm.

We describe the chunking procedure and the coverage of the chunker with examples, e.g. PPs/NPs with prenominal modification, tegen de uit ioniserende stralingen voortspruitende gevaren or de te fuseren vennootschappen. We also illustrate its use in term candidate extraction from about 20 million words of social security documents from Flanders.

Computational Linguistics in the Netherlands 2002

Selected Papers from the Thirteenth CLIN Meeting