Exploiting unlabeled data for question classification

  • Authors:
  • David Tomás;Claudio Giuliano

  • Affiliations:
  • Department of Software and Computing Systems, University of Alicante, Spain;Human Language Technology Group, FBK-Irst, Italy

  • Venue:
  • NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we introduce a kernel-based approach to question classification. We employed a kernel function based on latent semantic information acquired from Wikipedia. This kernel allows including external semantic knowledge into the supervised learning process.We obtained a highly effective question classifier combining this knowledge with a bag-of-words approach by means of composite kernels. As the semantic information is acquired from unlabeled text, our system can be easily adapted to different languages and domains. We tested it on a parallel corpus of English and Spanish questions.