We have developed a fully automatic recursive chunker for unrestricted Dutch text to be used as a basis for the extraction of linguistic and terminological information. The chunker is based on the approach adopted for the analysis of German in the YAC-chunker. Our tool builds up flat annotations of (maximal) syntactic constituents, using a multi-pass algorithm.
We describe the chunking procedure and the coverage of the chunker with examples, e.g. PPs/NPs with prenominal modification, tegen de uit ioniserende stralingen voortspruitende gevaren or de te fuseren vennootschappen. We also illustrate its use in term candidate extraction from about 20 million words of social security documents from Flanders.