RP-Miner: a relaxed prune algorithm for frequent similar pattern mining

  • Authors:
  • Ansel Yoan Rodríguez-González;José Francisco Martínez-Trinidad;Jesús Ariel Carrasco-Ochoa;José Ruiz-Shulcloper

  • Affiliations:
  • Advanced Technologies Application Center, Data Mining Department, Siboney, Havana, Cuba and National Institute of Astrophysics, Optics and Electronics, Department of Computer Science, Tonantzintla ...;National Institute of Astrophysics, Optics and Electronics, Department of Computer Science, Tonantzintla, Puebla, Mexico;National Institute of Astrophysics, Optics and Electronics, Department of Computer Science, Tonantzintla, Puebla, Mexico;Advanced Technologies Application Center, Siboney, Havana, Cuba

  • Venue:
  • Knowledge and Information Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most of the current algorithms for mining frequent patterns assume that two object subdescriptions are similar if they are equal, but in many real-world problems some other ways to evaluate the similarity are used. Recently, three algorithms (ObjectMiner, STreeDC-Miner and STreeNDC-Miner) for mining frequent patterns allowing similarity functions different from the equality have been proposed. For searching frequent patterns, ObjectMiner and STreeDC-Miner use a pruning property called Downward Closure property, which should be held by the similarity function. For similarity functions that do not meet this property, the STreeNDC-Miner algorithm was proposed. However, for searching frequent patterns, this algorithm explores all subsets of features, which could be very expensive. In this work, we propose a frequent similar pattern mining algorithm for similarity functions that do not meet the Downward Closure property, which is faster than STreeNDC-Miner and loses fewer frequent similar patterns than ObjectMiner and STreeDC-Miner. Also we show the quality of the set of frequent similar patterns computed by our algorithm with respect to the quality of the set of frequent similar patterns computed by the other algorithms, in a supervised classification context.