Voting between multiple data representations for text chunking

  • Authors:
  • Hong Shen;Anoop Sarkar

  • Affiliations:
  • School of Computing Science, Simon Fraser University, Burnaby, BC, Canada;School of Computing Science, Simon Fraser University, Burnaby, BC, Canada

  • Venue:
  • AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper considers the hypothesis that voting between multiple data representations can be more accurate than voting between multiple learning models This hypothesis has been considered before (cf [San00]) but the focus was on voting methods rather than the data representations In this paper, we focus on choosing specific data representations combined with simple majority voting On the community standard CoNLL-2000 data set, using no additional knowledge sources apart from the training data, we achieved 94.01 Fβ=1 score for arbitrary phrase identification compared to the previous best Fβ=1 93.90 We also obtained 95.23 Fβ=1 score for Base NP identification Significance tests show that our Base NP identification score is significantly better than the previous comparable best Fβ=1 score of 94.22 Our main contribution is that our model is a fast linear time approach and the previous best approach is significantly slower than our system.