Analysis of selective strategies to build a dependency-analyzed corpus

  • Authors:
  • Kiyonori Ohtake

  • Affiliations:
  • National Institute of Information and Communications Technology (NICT), Kyoto, Japan

  • Venue:
  • COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper discusses sampling strategies for building a dependency-analyzed corpus and analyzes them with different kinds of corpora. We used the Kyoto Text Corpus, a dependency-analyzed corpus of newspaper articles, and prepared the IPAL corpus, a dependency-analyzed corpus of example sentences in dictionaries, as a new and different kind of corpus. The experimental results revealed that the length of the test set controlled the accuracy and that the longest-first strategy was good for an expanding corpus, but this was not the case when constructing a corpus from scratch.