A classical Chinese corpus with nested part-of-speech tags

  • Authors:
  • John Lee

  • Affiliations:
  • City University of Hong Kong

  • Venue:
  • LaTeCH '12 Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce a corpus of classical Chinese poems that has been word segmented and tagged with parts-of-speech (POS). Due to the ill-defined concept of a 'word' in Chinese, previous Chinese corpora suffer from a lack of standardization in word segmentation, resulting in inconsistencies in POS tags, therefore hindering interoperability among corpora. We address this problem with nested POS tags, which accommodates different theories of wordhood and facilitates research objectives requiring annotations of the 'word' at different levels of granularity.