Corpus annotation/management tools for the project: balanced corpus of contemporary written Japanese

Authors:
Yuji Matsumoto
Affiliations:
Graduate School of Information Science, Nara Institute of Science and Technology, Takayama, Ikoma, Nara, Japan
Venue:
LKR'08 Proceedings of the 3rd international conference on Large-scale knowledge resources: construction and application
Year:
2008

Citing 4
Cited 0

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Japanese Named Entity extraction with redundant morphological analysis

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Japanese dependency analysis using cascaded chunking

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Exploiting syntactic patterns as clues in zero-anaphora resolution

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces our activities on corpus annotation and management tool development in the Japanese government funded project, Balanced Corpus of Contemporary Written Japanese. We are investigating various levels of text annotation that covers morphological and POS tagging, syntactic dependency parsing, predicate-argument analysis, and coreference analysis. Since automatic annotation is not perfect, we need annotated corpus management tools that facilitate corpus browsing and error correction. We especially take up our corpus management tool ChaKi, explains its functions, and discuss how we are trying to maintain consistency of corpus annotation.