Automatic feature extraction for question classification based on dissimilarity of probability distributions

Authors:
David Tomás;José L. Vicedo;Empar Bisbal;Lidia Moreno
Affiliations:
Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Spain;Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Spain;Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, Spain;Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, Spain
Venue:
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Year:
2006

Citing 6
Cited 2

The nature of statistical learning theory

The nature of statistical learning theory
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Learning question classifiers

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Parsing and question classification for question answering

ODQA '01 Proceedings of the workshop on Open-domain question answering - Volume 12
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
A multilingual SVM-based question classification system

MICAI'05 Proceedings of the 4th Mexican international conference on Advances in Artificial Intelligence

A semantic approach for question classification using WordNet and Wikipedia

Pattern Recognition Letters
Re-ranking passages with LSA in a question answering system

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Question classification is one of the first tasks carried out in a Question Answering system. In this paper we present a multilingual question classification system based on machine learning techniques. We use Support Vector Machines to classify the questions. All the features needed to train and test this method are automatically extracted through statistical information in an unsupervised way, comparing Poisson distributions of single words in two plain corpora of questions and documents. Thus, we need nothing but plain text to train the system, obtaining a flexible approach easy to adapt to new languages and domains. We have tested it on a bilingual corpus of questions in English and Spanish.