Mining Causality from Texts for Question Answering System

Authors:
Chaveevan Pechsiri;Asanee Kawtrakul
Affiliations:
-;-
Venue:
IEICE - Transactions on Information and Systems
Year:
2007

Citing 7
Cited 2

WordNet: a lexical database for English

Communications of the ACM
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Machine Learning

Machine Learning
An unsupervised approach to recognizing discourse relations

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory

SIGDIAL '01 Proceedings of the Second SIGdial Workshop on Discourse and Dialogue - Volume 16
Automatic detection of causal relations for Question Answering

MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12
Causal relation extraction using cue phrase and lexical pair probabilities

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Know-why extraction from textual data for supporting what question

KRAQ '08 Coling 2008: Proceedings of the workshop on Knowledge and Reasoning for Answering Questions
Bringing why-QA to web search

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This research aims to develop automatic knowledge mining of causality from texts for supporting an automatic question answering system (QA) in answering 'why' question, which is among the most crucial forms of questions. The out come of this research will assist people in diagnosing problems, such as in plant diseases, health, industrial and etc. While the previous works have extracted causality knowledge within only one or two adjacent EDUs (Elementary Discourse Units), this research focuses to mine causality knowledge existing within multiple EDUs which takes multiple causes and multiple effects in to consideration, where the adjacency between cause and effect is unnecessary. There are two main problems: how to identify the interesting causality events from documents, and how to identify the boundaries of the causative unit and the effective unit in term of the multiple EDUs. In addition, there are at least three main problems involved in boundaries identification: the implicit boundary delimiter, the nonadjacent cause-consequence, and the effect surrounded by causes. This research proposes using verb-pair rules learnt by comparing the Naïve Bayes classifier (NB) and Support Vector Machine (SVM) to identify causality EDUs in Thai agricultural and health news domains. The boundary identification problems are solved by utilizing verb-pair rules, Centering Theory and cue phrase set. The reason for emphasizing on using verbs to extract causality is that they explicitly make, in a certain way, the consequent events of cause-effect, e.g. 'Aphids suck the sap from rice leaves. Then leaves will shrink. Later, they will become yellow and dry.'. The outcome of the proposed methodology shown that the verb-pair rules extracted from NB outperform those extracted from SVM when the corpus contains high occurence of each verb, while the results from SVM is better than NB when the corpus contains less occurence of each verb. The verb-pair rules extracted from NB for causality extraction has the highest precision (0.88) with the recall of 0.75 from the plant disease corpus whereas from SVM has the highest precision (0.89) with the recall of 0.76 from bird flu news. For boundary determination, our methodology can handle very well with approximate 96% accuracy. In addition, the extracted causality results from this research can be generalized as laws in the Inductive-Statistical theory of Hempel's explanation theory, which will be useful for QA and reasoning.