Extracting causal knowledge from a medical database using graphical patterns
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Automatic detection of causal relations for Question Answering
MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12
Evaluating discourse-based answer extraction for why-question answering
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
ACM Transactions on Asian Language Information Processing (TALIP)
Using syntactic information for improving why-question answering
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Developing an approach for why-question answering
EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
What is not in the bag of words for why-qa?
Computational Linguistics
Towards domain independent why text segment classification based on bag of function words
AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Hi-index | 0.00 |
The aim of our research is to develop a scalable automatic why question answering system for English based on supervised method that uses part of speech analysis. The prior approach consisted in building a why-classifier using function words. This paper investigates the performance of combining supervised data mining methods with various feature selection strategies in order to obtain a more accurate why classifier. Feature selection was performed a priori on the dataset to extract representative verbs and/or nouns and avoid the dimensionality curse. Logit Boost and SVM were used for the classification process. Three methods of extending the initial "function words only" approach, to handle context-dependent features, are proposed and experimentally evaluated on various datasets. The first considers function words and context-independent adverbs; the second incorporates selected lemmatized verbs; the third contains selected lemmatized verbs & nouns. Experiments on web-extracted datasets showed that all methods performed better than the baseline, with slightly more reliable results for the third one.