Why text segment classification based on part of speech feature selection

  • Authors:
  • Iulia Nagy;Katsuyuki Tanaka;Yasuo Ariki

  • Affiliations:
  • Kobe University, Kobe, Japan;Kobe University, Kobe, Japan;Kobe University, Kobe, Japan

  • Venue:
  • DS'10 Proceedings of the 13th international conference on Discovery science
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The aim of our research is to develop a scalable automatic why question answering system for English based on supervised method that uses part of speech analysis. The prior approach consisted in building a why-classifier using function words. This paper investigates the performance of combining supervised data mining methods with various feature selection strategies in order to obtain a more accurate why classifier. Feature selection was performed a priori on the dataset to extract representative verbs and/or nouns and avoid the dimensionality curse. Logit Boost and SVM were used for the classification process. Three methods of extending the initial "function words only" approach, to handle context-dependent features, are proposed and experimentally evaluated on various datasets. The first considers function words and context-independent adverbs; the second incorporates selected lemmatized verbs; the third contains selected lemmatized verbs & nouns. Experiments on web-extracted datasets showed that all methods performed better than the baseline, with slightly more reliable results for the third one.