Fast Discovery and the Generalization of Strong Jumping Emerging Patterns for Building Compact and Accurate Classifiers

  • Authors:
  • Hongjian Fan;Kotagiri Ramamohanarao

  • Affiliations:
  • -;-

  • Venue:
  • IEEE Transactions on Knowledge and Data Engineering
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Classification of large data sets is an important data mining problem that has wide applications. Jumping Emerging Patterns (JEPs) are those itemsets whose supports increase abruptly from zero in one data set to nonzero in another data set. In this paper, we propose a fast, accurate, and less complex classifier based on a subset of JEPs, called Strong Jumping Emerging Patterns (SJEPs). The support constraint of SJEP removes potentially less useful JEPs while retaining those with high discriminating power. Previous algorithms based on the manipulation of border [1] as well as consEPMiner [2] cannot directly mine SJEPs. Here, we present a new tree-based algorithm for their efficient discovery. Experimental results show that: 1) the training of our classifier is typically 10 times faster than earlier approaches, 2) our classifier uses much fewer patterns than the JEP-Classifier [3] to achieve a similar (and, often, improved) accuracy, and 3) in many cases, it is superior to other state-of-the-art classification systems such as Naive Bayes, CBA, C4.5, and bagged and boosted versions of C4.5. We argue that SJEPs are high-quality patterns which possess the most differentiating power. As a consequence, they represent sufficient information for the construction of accurate classifiers. In addition, we generalize these patterns by introducing Noise-tolerant Emerging Patterns (NEPs) and Generalized Noise-tolerant Emerging Patterns (GNEPs). Our tree-based algorithms can be adopted to easily discover these variations. We experimentally demonstrate that SJEPs, NEPs, and GNEPs are extremely useful for building effective classifiers that can deal well with noise.