A Parallel Corpus Labeled Using Open and Restricted Domain Ontologies

Authors:
Ester Boldrini;Sergio Ferrández;Ruben Izquierdo;David Tomás;Jose Luis Vicedo
Affiliations:
Natural Language Processing and Information Systems Group Department of Software and Computing Systems, University of Alicante, Spain;Natural Language Processing and Information Systems Group Department of Software and Computing Systems, University of Alicante, Spain;Natural Language Processing and Information Systems Group Department of Software and Computing Systems, University of Alicante, Spain;Natural Language Processing and Information Systems Group Department of Software and Computing Systems, University of Alicante, Spain;Natural Language Processing and Information Systems Group Department of Software and Computing Systems, University of Alicante, Spain
Venue:
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Year:
2009

Citing 10
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Bridging the lexical chasm: statistical approaches to answer-finding

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Learning search engine specific query transformations for question answering

Proceedings of the 10th international conference on World Wide Web
GETESS - Searching the Web Exploiting German Texts

CIA '99 Proceedings of the Third International Workshop on Cooperative Information Agents III
Message Understanding Conference-6: a brief history

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Analysis of Statistical Question Classification for Fact-Based Questions

Information Retrieval
Learning question classifiers

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Automatic derivation of surface text patterns for a maximum entropy based question answering system

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Question Answering in Restricted Domains: An Overview

Computational Linguistics
Overview of the CLEF 2007 Multilingual Question Answering Track

Advances in Multilingual and Multimodal Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The analysis and creation of annotated corpus is fundamental for implementing natural language processing solutions based on machine learning. In this paper we present a parallel corpus of 4500 questions in Spanish and English on the touristic domain, obtained from real users. With the aim of training a question answering system, the questions were labeled with the expected answer type, according to two different ontologies. The first one is an open domain ontology based on Sekine's Extended Named Entity Hierarchy, while the second one is a restricted domain ontology, specific for the touristic field. Due to the use of two ontologies with different characteristics, we had to solve many problematic cases and adjusted our annotation thinking on the characteristics of each one. We present the analysis of the domain coverage of these ontologies and the results of the inter-annotator agreement. Finally we use a question classification system to evaluate the labeling of the corpus.