Sibyl, a factoid question-answering system for spoken documents

Authors:
Pere R. Comas;Jordi Turmo;Lluís Màrquez
Affiliations:
TALP Research Center, Technical University of Catalonia, Barcelona, Spain;TALP Research Center, Technical University of Catalonia, Barcelona, Spain;TALP Research Center, Technical University of Catalonia, Barcelona, Spain
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
2012

Citing 35
Cited 1

Performance Analysis of a Distributed Question/Answering System

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
High-performance, open-domain question answering from large text collections

High-performance, open-domain question answering from large text collections
Algorithms for language reconstruction

Algorithms for language reconstruction
Ultraconservative online algorithms for multiclass problems

The Journal of Machine Learning Research
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A new algorithm for the alignment of phonetic sequences

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Open-domain voice-activated question answering

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Learning surface text patterns for a Question Answering system

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
A noisy-channel approach to question answering

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Question answering passage retrieval using dependency relations

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Learning question classifiers: the role of semantic information

Natural Language Engineering
Exploring correlation of dependency relation paths for answer extraction

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Question answering with lexical chains propagating verb arguments

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Robust named entity extraction from large spoken archives

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Question Answering in Restricted Domains: An Overview

Computational Linguistics
Spoken Document Retrieval Based on Approximated Sequence Alignment

TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Recovering capitalization and punctuation marks for automatic speech recognition: Case study for Portuguese broadcast news

Speech Communication
A second-order joint eisner model for syntactic and semantic dependency parsing

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task
Recognizing textual entailment using sentence similarity based on dependency tree skeletons

RTE '07 Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing
Named Entity Recognition of Spoken Documents Using Subword Units

ICSC '09 Proceedings of the 2009 IEEE International Conference on Semantic Computing
An application of automated reasoning in natural language question answering

AI Communications - Practical Aspects of Automated Reasoning
Overview of QAST 2008

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Efficient question answering with question decomposition and multiple answer streams

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Robust question answering for speech transcripts: UPC experience in QAst 2008

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Rank learning for factoid question answering with linguistic and semantic constraints

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Reshaping automatic speech transcripts for robust high-level spoken document analysis

AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
FIDJI: using syntax for validating answers in multiple documents

Information Retrieval
Overview of ResPubliQA 2009: question answering evaluation over European legislation

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Overview of QAST 2009

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
The LIMSI participation in the QAst 2009 track: experimenting on answer scoring

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
A global relaxation labeling approach to coreference resolution

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Passage reranking for question answering using syntactic structures and answer types

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Linguistic kernels for answer re-ranking in question answering systems

Information Processing and Management: an International Journal
Experiments for the cross language speech retrieval task at CLEF 2006

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval

Spoken Content Retrieval: A Survey of Techniques and Technologies

Foundations and Trends in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article, we present a factoid question-answering system, Sibyl, specifically tailored for question answering (QA) on spoken-word documents. This work explores, for the first time, which techniques can be robustly adapted from the usual QA on written documents to the more difficult spoken document scenario. More specifically, we study new information retrieval (IR) techniques designed or speech, and utilize several levels of linguistic information for the speech-based QA task. These include named-entity detection with phonetic information, syntactic parsing applied to speech transcripts, and the use of coreference resolution. Sibyl is largely based on supervised machine-learning techniques, with special focus on the answer extraction step, and makes little use of handcrafted knowledge. Consequently, it should be easily adaptable to other domains and languages. Sibyl and all its modules are extensively evaluated on the European Parliament Plenary Sessions English corpus, comparing manual with automatic transcripts obtained by three different automatic speech recognition (ASR) systems that exhibit significantly different word error rates. This data belongs to the CLEF 2009 track for QA on speech transcripts. The main results confirm that syntactic information is very useful for learning to rank question candidates, improving results on both manual and automatic transcripts, unless the ASR quality is very low. At the same time, our experiments on coreference resolution reveal that the state-of-the-art technology is not mature enough to be effectively exploited for QA with spoken documents. Overall, the performance of Sibyl is comparable or better than the state-of-the-art on this corpus, confirming the validity of our approach.