TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A divide-and-conquer strategy for shallow parsing of German free texts
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Tagging accurately: don't guess if you know
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Improving data driven wordclass tagging by system combination
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Automatic refinement of a POS tagger using a reliable parser and plain text corpora
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Serial combination of rules and statistics: a case study in Czech tagging
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Topological field chunking for German
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Detecting errors in part-of-speech annotation
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Towards case-based parsing: are chunks reliable indicators for syntax trees?
LD '06 Proceedings of the Workshop on Linguistic Distances
Hi-index | 0.00 |
Annotating a corpus of German with chunks, topological fields and clause boundaries is both a goal in itself and a step towards further syntactic annotation. Partial annotation can serve as data to test linguistic hypotheses and it can be used as a pre-structuring for further linguistic annotation steps. If, however, the underlying part-of-speech (POS) annotation is imperfect, these errors will be passed on to the subsequent levels of annotation and increase annotation errors on those levels. It is especially damaging for subsequent annotation if POS tags are incorrect which provide the framework of the German sentence by demarcating the topological fields and the clause boundaries (e.g. subordinators and verbs). This paper presents a method to automatically annotate a corpus of German with chunks, topological fields and clause boundaries, and improve tagging accuracy at the same time in order to increase the overall annotation accuracy. Tag improvement primarily relies on the linguistic knowledge encoded in the grammar for annotating the topological fields.