Labeling by landscaping: classifying tokens in context by pruning and decorating trees

  • Authors:
  • Siddharth Patwardhan;Branimir Boguraev;Apoorv Agarwal;Alessandro Moschitti;Jennifer Chu-Carroll

  • Affiliations:
  • IBM T. J. Watson Research Center, Yorktown Heights, NY, USA;IBM T. J. Watson Research Center, Yorktown Heights, NY, USA;Columbia University, New York, NY, USA;University of Trento, Trento, Italy;IBM T. J. Watson Research Center, Yorktown Heights, NY, USA

  • Venue:
  • Proceedings of the 21st ACM international conference on Information and knowledge management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

State-of-the-art approaches to token labeling within text documents typically cast the problem either as a classification task, without using complex structural characteristics of the input, or as a sequential labeling task, carried out by a Conditional Random Field (CRF) classifier. Here we explore principled ways for structure to be brought to bear on the task. In line with recent trends in statistical learning of structured natural language input, we use a Support Vector Machine (SVM) classification framework deploying tree kernels. We then propose tree transformations and decorations, as a methodology for modeling complex linguistic phenomena in highly multi-dimensional feature spaces. We develop a general purpose tree engineering framework, which enables us to transcend the typically complex and laborious process of feature engineering. We build kernel based classifiers for two token labeling tasks: fine-grained event recognition, and lexical answer type detection in questions. For both, we show that in comparison with a corresponding linear kernel SVM, our method of using tree kernels improves recognition, thanks to appropriately engineering tree structures for use by the tree kernel. We also observe significant improvements when comparing with a CRF-based realization of structured prediction, itself performing at levels comparable to state-of-the-art.