Condensed representation of EPs and patterns quantified by frequency-based measures

  • Authors:
  • Arnaud Soulet;Bruno Crémilleux;François Rioult

  • Affiliations:
  • GREYC, CNRS – UMR 6072, Université de Caen, Caen, France;GREYC, CNRS – UMR 6072, Université de Caen, Caen, France;GREYC, CNRS – UMR 6072, Université de Caen, Caen, France

  • Venue:
  • KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Emerging patterns (EPs) are associations of features whose frequencies increase significantly from one class to another. They have been proven useful to build powerful classifiers and to help establishing diagnosis. Because of the huge search space, mining and representing EPs is a hard and complex task for large datasets. Thanks to the use of recent results on condensed representations of frequent closed patterns, we propose here an exact condensed representation of EPs (i.e., all EPs and their growth rates). From this condensed representation, we give a method to provide interesting EPs, in fact those with the highest growth rates. We call strong emerging patterns (SEPs) these EPs. We also highlight a property characterizing the jumping emerging patterns. Experiments quantify the interests of SEPs (smaller number, ability to extract longer and less frequent patterns) and show their usefulness (in collaboration with the Philips company, SEPs successfully enabled to identify the failures of a production chain of silicon plates). These concepts of condensed representation and “strong patterns” with respect to a measure are generalized to other interestingness measures based on frequencies.