Backward chaining rule induction

  • Authors:
  • Douglas H. Fisher;Mary E. Edgerton;Zhihua Chen;Lianhong Tang;Lewis Frey

  • Affiliations:
  • (Correspd. E-mail: douglas.h.fisher@vanderbilt.edu) Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37235, USA;Department of Interdisciplinary Oncology, H. Lee Moffitt Cancer Center and Research Institute, SRB-3, 12902 Magnolia Drive, Tampa, Fl 33612, USA;Department of Interdisciplinary Oncology, H. Lee Moffitt Cancer Center and Research Institute, SRB-3, 12902 Magnolia Drive, Tampa, Fl 33612, USA;Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37232, USA;Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37232, USA

  • Venue:
  • Intelligent Data Analysis - Selected papers from IDA2005, Madrid, Spain
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Exploring the vast number of possible feature interactions in domains such as gene expression microarray data is an onerous task. We describe Backward-Chaining Rule Induction (BCRI) as a semi-supervised mechanism for biasing the search for IF-THEN rules that express plausible feature interactions. BCRI adds to a relatively limited tool-chest of hypothesis generation software and is an alternative to purely unsupervised association-rule learning. We illustrate BCRI by using it to search for gene-to-gene causal mechanisms that underlie lung cancer. Mapping hypothesized gene interactions against prior knowledge offers support and explanations for hypothesized interactions, and suggests gaps in current knowledge that induction might help fill. Our assumption is that "good" hypotheses incrementally extend/revise existing knowledge. BCRI is implemented as a wrapper around a base supervised-rule-learning algorithm. We summarize our prior work with an adaptation of C4.5 as the base algorithm (C45-BCRI), extending this in the current study to use Brute as the base algorithm (Brute-BCRI). In contrast to C4.5's greedy strategy, Brute extensively searches the rule space. Moreover, Brute returns many more rules (i.e., hypothesized feature interactions) than does C4.5. To remain an effective hypothesis-generation tool requires that Brute-BCRI more carefully rank and prune hypothesized interactions than does C45-BCRI. Prior knowledge serves to evaluate final Brute-BCRI rules just as it does with C45-BCRI, but prior knowledge also serves to evaluate and prune intermediate search states, thus maintaining a manageable number of rules for evaluation by a domain expert.