Introduction to algorithms
Summarizing text documents: sentence selection and evaluation metrics
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An Information-Theoretic Definition of Similarity
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A Maximum-Entropy-Inspired Parser
A Maximum-Entropy-Inspired Parser
Lexical cohesion computed by thesaural relations as an indicator of the structure of text
Computational Linguistics
Automatic retrieval and clustering of similar words
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Manual and automatic evaluation of summaries
AS '02 Proceedings of the ACL-02 Workshop on Automatic Summarization - Volume 4
Answering complex questions with random walk models
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Dependency-based sentence alignment for multiple document summarization
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A translation model for sentence retrieval
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Using random walks for question-focused sentence retrieval
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Learning to recognize features of valid textual entailments
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Structure and semantics for expressive text kernels
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Improving the performance of the random walk model for answering complex questions
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Kernels on linguistic structures for answer extraction
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Advances in Open Domain Question Answering
Advances in Open Domain Question Answering
Selecting sentences for answering complex questions
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
LexRank: graph-based lexical centrality as salience in text summarization
Journal of Artificial Intelligence Research
Combined syntactic and semantic Kernels for text classification
ECIR'07 Proceedings of the 29th European conference on IR research
Highly frequent terms and sentence retrieval
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Efficient convolution kernels for dependency and constituent syntactic trees
ECML'06 Proceedings of the 17th European conference on Machine Learning
A reinforcement learning framework for answering complex questions
Proceedings of the 16th international conference on Intelligent user interfaces
Natural Language Engineering
Learning good decompositions of complex questions
NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
Hi-index | 0.00 |
Complex questions that require inferencing and synthesizing information from multiple documents can be seen as a kind of topic-oriented, informative multi-document summarization where the goal is to produce a single text as a compressed version of a set of documents with a minimum loss of relevant information. In this paper, we experiment with one empirical method and two unsupervised statistical machine learning techniques: K-means and Expectation Maximization (EM), for computing relative importance of the sentences. We compare the results of these approaches. Our experiments show that the empirical approach outperforms the other two techniques and EM performs better than K-means. However, the performance of these approaches depends entirely on the feature set used and the weighting of these features. In order to measure the importance and relevance to the user query we extract different kinds of features (i.e. lexical, lexical semantic, cosine similarity, basic element, tree kernel based syntactic and shallow-semantic) for each of the document sentences. We use a local search technique to learn the weights of the features. To the best of our knowledge, no study has used tree kernel functions to encode syntactic/semantic information for more complex tasks such as computing the relatedness between the query sentences and the document sentences in order to generate query-focused summaries (or answers to complex questions). For each of our methods of generating summaries (i.e. empirical, K-means and EM) we show the effects of syntactic and shallow-semantic features over the bag-of-words (BOW) features.