Unsupervised question answering data acquisition from local corpora

Authors:
Lucian Vlad Lita;Jaime Carbonell
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Year:
2004

Citing 13
Cited 7

Exploiting redundancy in question answering

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
On the MSE robustness of batching estimators

Proceedings of the 33nd conference on Winter simulation
Decision lists for lexical ambiguity resolution: application to accent restoration in Spanish and French

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Using knowledge to facilitate factoid answer pinpointing

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Learning surface text patterns for a Question Answering system

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Learning semantic constraints for the automatic discovery of part-whole relations

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
COGEX: a logic prover for question answering

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Automatic derivation of surface text patterns for a maximum entropy based question answering system

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Offline strategies for online question answering: answering questions before they are asked

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A noisy-channel approach to question answering

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
The structure and performance of an open-domain question answering system

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
A bootstrapping method for learning semantic lexicons using extraction pattern contexts

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Resource analysis for question answering

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions

Genetic algorithms for data-driven web question answering

Evolutionary Computation
Mining web snippets to answer list questions

AIDM '07 Proceedings of the 2nd international workshop on Integrating artificial intelligence and data mining - Volume 84
Model tree learning for query term weighting in question answering

ECIR'07 Proceedings of the 29th European conference on IR research
Intelligent answering location questions from the web using molecular alignment

Journal of Intelligent Information Systems
A survey on question answering technology from an information retrieval perspective

Information Sciences: an International Journal
Molecular sequence alignment for extracting answers for where-typed questions from google snippets

KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I
Machine learning for query formulation in question answering

Natural Language Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data-driven approaches in question answering (QA) are increasingly common. Since availability of training data for such approaches is very limited, we propose an unsupervised algorithm that generates high quality question-answer pairs from local corpora. The algorithm is ontology independent, requiring very small seed data as its starting point. Two alternating views of the data make learning possible: 1) question types are viewed as relations between entities and 2) question types are described by their corresponding question-answer pairs. These two aspects of the data allow us to construct an unsupervised algorithm that acquires high precision question-answer pairs. We show the quality of the acquired data for different question types and perform a task-based evaluation. With each iteration, pairs acquired by the unsupervised algorithm are used as training data to a simple QA system. Performance increases with the number of question-answer pairs acquired confirming the robustness of the unsupervised algorithm. We introduce the notion of semantic drift and show that it is a desirable quality in training data for question answering systems.