Why text segment classification based on part of speech feature selection

Authors:
Iulia Nagy;Katsuyuki Tanaka;Yasuo Ariki
Affiliations:
Kobe University, Kobe, Japan;Kobe University, Kobe, Japan;Kobe University, Kobe, Japan
Venue:
DS'10 Proceedings of the 13th international conference on Discovery science
Year:
2010

Citing 8
Cited 1

Extracting causal knowledge from a medical database using graphical patterns

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Automatic detection of causal relations for Question Answering

MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12
Evaluating discourse-based answer extraction for why-question answering

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Automatically Acquiring Causal Expression Patterns from Relation-annotated Corpora to Improve Question Answering for why-Questions

ACM Transactions on Asian Language Information Processing (TALIP)
Using syntactic information for improving why-question answering

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Developing an approach for why-question answering

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
What is not in the bag of words for why-qa?

Computational Linguistics

Towards domain independent why text segment classification based on bag of function words

AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The aim of our research is to develop a scalable automatic why question answering system for English based on supervised method that uses part of speech analysis. The prior approach consisted in building a why-classifier using function words. This paper investigates the performance of combining supervised data mining methods with various feature selection strategies in order to obtain a more accurate why classifier. Feature selection was performed a priori on the dataset to extract representative verbs and/or nouns and avoid the dimensionality curse. Logit Boost and SVM were used for the classification process. Three methods of extending the initial "function words only" approach, to handle context-dependent features, are proposed and experimentally evaluated on various datasets. The first considers function words and context-independent adverbs; the second incorporates selected lemmatized verbs; the third contains selected lemmatized verbs & nouns. Experiments on web-extracted datasets showed that all methods performed better than the baseline, with slightly more reliable results for the third one.