Annotating topological fields and chunks: and revising POS tags at the same time

Authors:
Frank Henrik Müller;Tylman Ule
Affiliations:
Seminar für Sprachwissenschaft, Universität Tübingen;Seminar für Sprachwissenschaft, Universität Tübingen
Venue:
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Year:
2002

Citing 7
Cited 3

TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A divide-and-conquer strategy for shallow parsing of German free texts

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Tagging accurately: don't guess if you know

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Improving data driven wordclass tagging by system combination

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Automatic refinement of a POS tagger using a reliable parser and plain text corpora

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Serial combination of rules and statistics: a case study in Czech tagging

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Topological field chunking for German

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20

Detecting errors in part-of-speech annotation

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Parsing coordinations

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Towards case-based parsing: are chunks reliable indicators for syntax trees?

LD '06 Proceedings of the Workshop on Linguistic Distances

Quantified Score

Hi-index	0.00

Visualization

Abstract

Annotating a corpus of German with chunks, topological fields and clause boundaries is both a goal in itself and a step towards further syntactic annotation. Partial annotation can serve as data to test linguistic hypotheses and it can be used as a pre-structuring for further linguistic annotation steps. If, however, the underlying part-of-speech (POS) annotation is imperfect, these errors will be passed on to the subsequent levels of annotation and increase annotation errors on those levels. It is especially damaging for subsequent annotation if POS tags are incorrect which provide the framework of the German sentence by demarcating the topological fields and the clause boundaries (e.g. subordinators and verbs). This paper presents a method to automatically annotate a corpus of German with chunks, topological fields and clause boundaries, and improve tagging accuracy at the same time in order to increase the overall annotation accuracy. Tag improvement primarily relies on the linguistic knowledge encoded in the grammar for annotating the topological fields.