Complex question answering: unsupervised learning approaches and experiments

Authors:
Yllias Chali;Shafiq R. Joty;Sadid A. Hasan
Affiliations:
University of Lethbridge, Lethbridge, AB, Canada;University of British Columbia, Vancouver, BC, Canada;University of Lethbridge, Lethbridge, AB, Canada
Venue:
Journal of Artificial Intelligence Research
Year:
2009

Citing 21
Cited 3

Introduction to algorithms

Introduction to algorithms
Summarizing text documents: sentence selection and evaluation metrics

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A Maximum-Entropy-Inspired Parser

A Maximum-Entropy-Inspired Parser
Lexical cohesion computed by thesaural relations as an indicator of the structure of text

Computational Linguistics
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Manual and automatic evaluation of summaries

AS '02 Proceedings of the ACL-02 Workshop on Automatic Summarization - Volume 4
Answering complex questions with random walk models

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Dependency-based sentence alignment for multiple document summarization

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A translation model for sentence retrieval

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Using random walks for question-focused sentence retrieval

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Learning to recognize features of valid textual entailments

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Structure and semantics for expressive text kernels

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Improving the performance of the random walk model for answering complex questions

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Kernels on linguistic structures for answer extraction

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Advances in Open Domain Question Answering

Advances in Open Domain Question Answering
Selecting sentences for answering complex questions

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
LexRank: graph-based lexical centrality as salience in text summarization

Journal of Artificial Intelligence Research
Combined syntactic and semantic Kernels for text classification

ECIR'07 Proceedings of the 29th European conference on IR research
Highly frequent terms and sentence retrieval

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Efficient convolution kernels for dependency and constituent syntactic trees

ECML'06 Proceedings of the 17th European conference on Machine Learning

A reinforcement learning framework for answering complex questions

Proceedings of the 16th international conference on Intelligent user interfaces
Query-focused multi-document summarization: Automatic data annotations and supervised learning approaches

Natural Language Engineering
Learning good decompositions of complex questions

NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Complex questions that require inferencing and synthesizing information from multiple documents can be seen as a kind of topic-oriented, informative multi-document summarization where the goal is to produce a single text as a compressed version of a set of documents with a minimum loss of relevant information. In this paper, we experiment with one empirical method and two unsupervised statistical machine learning techniques: K-means and Expectation Maximization (EM), for computing relative importance of the sentences. We compare the results of these approaches. Our experiments show that the empirical approach outperforms the other two techniques and EM performs better than K-means. However, the performance of these approaches depends entirely on the feature set used and the weighting of these features. In order to measure the importance and relevance to the user query we extract different kinds of features (i.e. lexical, lexical semantic, cosine similarity, basic element, tree kernel based syntactic and shallow-semantic) for each of the document sentences. We use a local search technique to learn the weights of the features. To the best of our knowledge, no study has used tree kernel functions to encode syntactic/semantic information for more complex tasks such as computing the relatedness between the query sentences and the document sentences in order to generate query-focused summaries (or answers to complex questions). For each of our methods of generating summaries (i.e. empirical, K-means and EM) we show the effects of syntactic and shallow-semantic features over the bag-of-words (BOW) features.