High precision treebanking: blazing useful trees using POS information

Authors:
Takaaki Tanaka;Francis Bond;Stephan Oepen;Sanae Fujita
Affiliations:
NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation;NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation;Universitetet i Oslo and CSLI, Stanford;NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation
Venue:
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Year:
2005

Citing 4
Cited 4

Procedure for quantitatively comparing the syntactic coverage of English grammars

HLT '91 Proceedings of the workshop on Speech and Natural Language
The LinGO Redwoods treebank motivation and preliminary applications

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
Efficient deep processing of Japanese

COLING '02 Proceedings of the 3rd workshop on Asian language resources and international standardization - Volume 12
The hinoki treebank a treebank for text understanding

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

An implemented description of Japanese: the Lexeed dictionary and the Hinoki treebank

COLING-ACL '06 Proceedings of the COLING/ACL on Interactive presentation sessions
Active learning and logarithmic opinion pools for hpsg parse selection

Natural Language Engineering
The Hinoki Sensebank: a large-scale word sense tagged corpus of Japanese

LAC '06 Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006
Cross-Domain Effects on Parse Selection for Precision Grammars

Research on Language and Computation

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper we present a quantitative and qualitative analysis of annotation in the Hinoki treebank of Japanese, and investigate a method of speeding annotation by using part-of-speech tags. The Hinoki treebank is a Redwoods-style treebank of Japanese dictionary definition sentences. 5,000 sentences are annotated by three different annotators and the agreement evaluated. An average agreement of 65.4% was found using strict agreement, and 83.5% using labeled precision. Exploiting POS tags allowed the annotators to choose the best parse with 19.5% fewer decisions.