IXIR: A statistical information distillation system

Authors:
Michael Levit;Dilek Hakkani-Tür;Gokhan Tur;Daniel Gillick
Affiliations:
International Computer Science Institute, Berkeley, CA, USA;International Computer Science Institute, Berkeley, CA, USA;SRI International, Menlo Park, CA, USA;International Computer Science Institute, Berkeley, CA, USA
Venue:
Computer Speech and Language
Year:
2009

Citing 14
Cited 0

TINA: a natural language system for spoken language applications

Computational Linguistics
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
Induction of Decision Trees

Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
The Berkeley FrameNet Project

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Learning surface text patterns for a Question Answering system

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
COGEX: a logic prover for question answering

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Verbnet: a broad-coverage, comprehensive verb lexicon

Verbnet: a broad-coverage, comprehensive verb lexicon
Question answering based on semantic structures

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A semantic approach to recognizing textual entailment

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Question answering based on semantic roles

DeepLP '07 Proceedings of the Workshop on Deep Linguistic Processing
Using semantic and syntactic graphs for call classification

FeatureEng '05 Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing
The PASCAL recognising textual entailment challenge

MLCW'05 Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment

Quantified Score

Hi-index	0.00

Visualization

Abstract

The task of information distillation is to extract snippets from massive multilingual audio and textual document sources that are relevant for a given templated query. We present an approach that focuses on the sentence extraction phase of the distillation process. It selects document sentences with respect to their relevance to a query via statistical classification with support vector machines. The distinguishing contribution of the approach is a novel method to generate classification features. The features are extracted from charts, compilations of elements from various annotation layers, such as word transcriptions, syntactic and semantic parses, and information extraction (IE) annotations. We describe a procedure for creating charts from documents and queries, while paying special attention to query slots (free-text descriptions of names, organizations, topic, events and so on, around which templates are centered), and suggest various types of classification features that can be extracted from these charts. While observing a 30% relative improvement due to non-lexical annotation layers, we perform a detailed analysis of the contributions of each of these layers to classification performance.