Machine learning for query formulation in question answering

Authors:
Christof Monz
Affiliations:
Informatics institute, university of amsterdam, science park 107, 1098 xg amsterdam, the netherlands e-mail: c.monz@uva.nl
Venue:
Natural Language Engineering
Year:
2011

Citing 30
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Non-parametric significance tests of retrieval performance comparisons

Journal of Information Science
Query expansion using lexical-semantic relations

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Document length normalization

Information Processing and Management: an International Journal - Special issue: history of information science
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
A machine learning approach to inductive query by examples: an experiment using relevance feedback, ID3, genetic algorithms, and simulated annealing

Journal of the American Society for Information Science
Improving automatic query expansion

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Technical Note: Naive Bayes for Regression

Machine Learning
The effect of pool depth on system evaluation in TREC

Journal of the American Society for Information Science and Technology
Mining the web for answers to natural language questions

Proceedings of the tenth international conference on Information and knowledge management
An adaptation of Relief for attribute estimation in regression

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
High-performance, open-domain question answering from large text collections

High-performance, open-domain question answering from large text collections
Theoretical and Empirical Analysis of ReliefF and RReliefF

Machine Learning
Discovery of inference rules for question-answering

Natural Language Engineering
Using decision trees to construct a practical parser

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Word-sense disambiguation using statistical methods

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Learning to find answers to questions on the Web

ACM Transactions on Internet Technology (TOIT)
Experiments with open-domain textual Question Answering

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
The effect of document retrieval quality on factoid question answering performance

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised question answering data acquisition from local corpora

Proceedings of the thirteenth ACM international conference on Information and knowledge management
The role of lexico-semantic feedback in open-domain textual question-answering

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Learning surface text patterns for a Question Answering system

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Question answering using maximum entropy components

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
An analysis of the AskMSR question-answering system

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Learning question classifiers: the role of semantic information

Natural Language Engineering
Structured retrieval for question answering

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Discovering key concepts in verbose queries

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A term dependency-based approach for query terms ranking

Proceedings of the 18th ACM conference on Information and knowledge management
Document retrieval in the context of question answering

ECIR'03 Proceedings of the 25th European conference on IR research
Automatic acquisition of semantic-based question reformulations for question answering

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Research on question answering dates back to the 1960s but has more recently been revisited as part of TREC's evaluation campaigns, where question answering is addressed as a subarea of information retrieval that focuses on specific answers to a user's information need. Whereas document retrieval systems aim to return the documents that are most relevant to a user's query, question answering systems aim to return actual answers to a users question. Despite this difference, question answering systems rely on information retrieval components to identify documents that contain an answer to a user's question. The computationally more expensive answer extraction methods are then applied only to this subset of documents that are likely to contain an answer. As information retrieval methods are used to filter the documents in the collection, the performance of this component is critical as documents that are not retrieved are not analyzed by the answer extraction component. The formulation of queries that are used for retrieving those documents has a strong impact on the effectiveness of the retrieval component. In this paper, we focus on predicting the importance of terms from the original question. We use model tree machine learning techniques in order to assign weights to query terms according to their usefulness for identifying documents that contain an answer. Term weights are learned by inspecting a large number of query formulation variations and their respective accuracy in identifying documents containing an answer. Several linguistic features are used for building the models, including part-of-speech tags, degree of connectivity in the dependency parse tree of the question, and ontological information. All of these features are extracted automatically by using several natural language processing tools. Incorporating the learned weights into a state-of-the-art retrieval system results in statistically significant improvements in identifying answer-bearing documents.