The nature of statistical learning theory
The nature of statistical learning theory
Bridging the lexical chasm: statistical approaches to answer-finding
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Learning search engine specific query transformations for question answering
Proceedings of the 10th international conference on World Wide Web
GETESS - Searching the Web Exploiting German Texts
CIA '99 Proceedings of the Third International Workshop on Cooperative Information Agents III
Message Understanding Conference-6: a brief history
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Analysis of Statistical Question Classification for Fact-Based Questions
Information Retrieval
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Automatic derivation of surface text patterns for a maximum entropy based question answering system
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Question Answering in Restricted Domains: An Overview
Computational Linguistics
Overview of the CLEF 2007 Multilingual Question Answering Track
Advances in Multilingual and Multimodal Information Retrieval
Hi-index | 0.00 |
The analysis and creation of annotated corpus is fundamental for implementing natural language processing solutions based on machine learning. In this paper we present a parallel corpus of 4500 questions in Spanish and English on the touristic domain, obtained from real users. With the aim of training a question answering system, the questions were labeled with the expected answer type, according to two different ontologies. The first one is an open domain ontology based on Sekine's Extended Named Entity Hierarchy, while the second one is a restricted domain ontology, specific for the touristic field. Due to the use of two ontologies with different characteristics, we had to solve many problematic cases and adjusted our annotation thinking on the characteristics of each one. We present the analysis of the domain coverage of these ontologies and the results of the inter-annotator agreement. Finally we use a question classification system to evaluate the labeling of the corpus.