Item set mining based on cover similarity

Authors:
Marc Segond;Christian Borgelt
Affiliations:
European Centre for Soft Computing, Mieres, Asturias, Spain;European Centre for Soft Computing, Mieres, Asturias, Spain
Venue:
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Year:
2011

Citing 4
Cited 6

Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
KDD-Cup 2000 organizers' report: peeling the onion

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Fast vertical mining using diffsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

Mining classification rules without support: an anti-monotone property of Jaccard measure

DS'11 Proceedings of the 14th international conference on Discovery science
Mining fault-tolerant item sets using subset size occurrence distributions

IDA'11 Proceedings of the 10th international conference on Advances in intelligent data analysis X
New exact concise representation of rare correlated patterns: application to intrusion detection

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
(Missing) concept discovery in heterogeneous information networks

Bisociative Knowledge Discovery
Frequent item set mining

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Mining high coherent association rules with consideration of support measure

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

While in standard frequent item set mining one tries to find item sets the support of which exceeds a user-specified threshold (minimum support) in a database of transactions, we strive to find item sets for which the similarity of their covers (that is, the sets of transactions containing them) exceeds a user-specified threshold. Starting from the generalized Jaccard index we extend our approach to a total of twelve specific similarity measures and a generalized form. We present an efficient mining algorithm that is inspired by the well-known Eclat algorithm and its improvements. By reporting experiments on several benchmark data sets we demonstrate that the runtime penalty incurred by the more complex (but also more informative) item set assessment is bearable and that the approach yields high quality and more useful item sets.