High precision treebanking: blazing useful trees using POS information

  • Authors:
  • Takaaki Tanaka;Francis Bond;Stephan Oepen;Sanae Fujita

  • Affiliations:
  • NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation;NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation;Universitetet i Oslo and CSLI, Stanford;NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation

  • Venue:
  • ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this paper we present a quantitative and qualitative analysis of annotation in the Hinoki treebank of Japanese, and investigate a method of speeding annotation by using part-of-speech tags. The Hinoki treebank is a Redwoods-style treebank of Japanese dictionary definition sentences. 5,000 sentences are annotated by three different annotators and the agreement evaluated. An average agreement of 65.4% was found using strict agreement, and 83.5% using labeled precision. Exploiting POS tags allowed the annotators to choose the best parse with 19.5% fewer decisions.