Label correspondence learning for part-of-speech annotation transformation

  • Authors:
  • Muhua Zhu;Huizhen Wang;Jingbo Zhu

  • Affiliations:
  • Northeastern University, Shenyang, China;Northeastern University, Shenyang, China;Northeastern University, Shenyang, China

  • Venue:
  • Proceedings of the 18th ACM conference on Information and knowledge management
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The performance of machine learning methods heavily depends on the volume of used training data. For the purpose of dataset enlargement, it is of interest to study the problem of unifying multiple labeled datasets with different annotation standards. In this paper, we focus on the case of unifying datasets for sequence labeling problems with natural language part-of-speech (POS) tagging as an examplar application. To this end, we propose a probabilistic approach to transforming the annotations of one dataset to the standard specified by another dataset. The key component of the approach, named as label correspondence learning, serves as a bridge of annotations from the datasets. Two methods designed from distinct perspectives are proposed to attack this sub-problem. Experiments on two large-scale part-of-speech datasets demonstrate the efficacy of the transformation and label correspondence learning methods.