Automatic discovery of locally frequent itemsets in the presence of highly frequent itemsets

  • Authors:
  • Ferenc Bodon;Ioannis N. Kouris;Christos H. Makris;Athanasios K. Tsakalidis

  • Affiliations:
  • Informatics Laboratory, Computer and Automation Research Institute, Hungarian Academy of Sciences and Department of Computer Science and Information Theory, Budapest University of Technology and E ...;Department of Computer Engineering and Informatics, University of Patras, School of Engineering, 26500 Patras, Hellas, Greece and Computer Technology Institute, P.O. BOX 1192, 26110 Patras, Hellas ...;Department of Computer Engineering and Informatics, University of Patras, School of Engineering, 26500 Patras, Hellas, Greece and Computer Technology Institute, P.O. BOX 1192, 26110 Patras, Hellas ...;Department of Computer Engineering and Informatics, University of Patras, School of Engineering, 26500 Patras, Hellas, Greece and Computer Technology Institute, P.O. BOX 1192, 26110 Patras, Hellas ...

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many alternatives have been proposed for the mining of association rules involving rare but 'interesting' itemsets in a dataset where there also exist highly frequent itemsets. Nevertheless, all the approaches thus far suggested that we knew which those interesting itemsets are, as well as which is the right support value for them. None of the approaches proposed a way of automatically discovering such items. In this work we introduce the notion of locally frequent itemsets and support their existence as the biggest and most frequently appearing category of rare but interesting itemsets especially at commercial applications, based on the opinion of field experts. Subsequently we propose two algorithms for finding and handling these itemsets. The main idea is to divide the database into partitions according to the problem needs and besides searching for itemsets which are frequent in the whole database to search also for itemsets which are frequent if considered within these partitions. Our approach proves very effective and also very efficient as compared to the traditional algorithms both in synthetic and real data.