EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
TüSBL: a similarity-based chunk parser for robust syntactic processing
HLT '01 Proceedings of the first international conference on Human language technology research
Experiments in German noun chunking
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Significance tests for the evaluation of ranking methods
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Hi-index | 0.00 |
More and more text corpora are available electronically. They contain information about linguistic and lexicographic properties of words, and word combinations. The amount of data is too large to extract the information manually. Thus, we need means for a (semi-)automatic processing, i.e., we need to analyse the text to be able to extract the relevant information.The question is what are the requirements for a text analysing tool, and do existing systems meet the needs of lexicographic acquisition. The hypothesis is that the better and more detailed the off-line annotation, the better and faster the on-line extraction. However, the more detailed the off-line annotation, the more complex the grammar, the more time consuming and difficult the grammar development, and the slower the parsing process.For the application as an analyzing tool in computational lexicography a symbolic chunker with a hand-written grammar seems to be a good choice. The available chunkers for German, however, do not consider all of the additional information needed for this task such as head lemma, morpho-syntactic information, and lexical or semantic properties, which are useful if not necessary for extraction processes. Thus, we decided to build a recursive chunker for unrestricted German text within the framework of the IMS Corpus Workbench (CWB).