Studying feature generation from various data representations for answer extraction

Authors:
Dan Shen;Geert-Jan M. Kruijff;Dietrich Klakow
Affiliations:
Saarland University, Postfach, Saarbruecken, Germany;Saarland University, Postfach, Saarbruecken, Germany;Saarland University, Postfach, Saarbruecken, Germany
Venue:
FeatureEng '05 Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing
Year:
2005

Citing 10
Cited 0

Making large-scale support vector machine learning practical

Advances in kernel methods
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Trainable question-answering systems

Trainable question-answering systems
Kernel methods for relation extraction

The Journal of Machine Learning Research
A new statistical parser based on bigram lexical dependencies

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
SVM answer selection for open-domain question answering

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
A noisy-channel approach to question answering

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Statistical QA - classifier vs. re-ranker: what's the difference?

MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12
Dependency tree kernels for relation extraction

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we study how to generate features from various data representations, such as surface texts and parse trees, for answer extraction. Besides the features generated from the surface texts, we mainly discuss the feature generation in the parse trees. We propose and compare three methods, including feature vector, string kernel and tree kernel, to represent the syntactic features in Support Vector Machines. The experiment on the TREC question answering task shows that the features generated from the more structured data representations significantly improve the performance based on the features generated from the surface texts. Furthermore, the contribution of the individual feature will be discussed in detail.