A data mining proxy approach for efficient frequent itemset mining

Authors:
Jeffrey Xu Yu;Zhiheng Li;Guimei Liu
Affiliations:
The Chinese University of Hong Kong, Hong Kong, China;The Chinese University of Hong Kong, Hong Kong, China;National University of Singapore, Singapore, Singapore
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2008

Citing 36
Cited 4

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Exploratory mining and pruning optimizations of constrained associations rules

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimization of constrained frequent set queries with 2-variable constraints

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Using a knowledge cache for interactive discovery of association rules

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Turbo-charging vertical mining of large databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient search for association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Depth first generation of long patterns

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Can we push more constraints into frequent pattern mining?

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A tree projection algorithm for generation of frequent item sets

Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
Pincer Search: A New Algorithm for Discovering the Maximum Frequent Set

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Mining Association Rules: Anti-Skew Algorithms

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Mining Frequent Item Sets with Convertible Constraints

Proceedings of the 17th International Conference on Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
LPMiner: An Algorithm for Finding Frequent Itemsets Using Length-Decreasing Support Constraint

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
DualMiner: a dual-pruning algorithm for itemsets with constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent item sets by opportunistic projection

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Ascending Frequency Ordered Prefix-tree: Efficient Mining of Frequent Patterns

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Mining Top.K Frequent Closed Patterns without Minimum Support

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Speed-up Iterative Frequent Itemset Mining with Constraint Changes

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficient Mining of Constrained Correlated Sets

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Efficient Indexing Structures for Mining Frequent Patterns

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
OSSM: A Segmentation Approach to Optimize Frequency Counting

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
CLOSET+: searching for the best strategies for mining frequent closed itemsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast vertical mining using diffsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient dynamic mining of constrained frequent sets

ACM Transactions on Database Systems (TODS)

Urban area characterization based on semantics of crowd activities in Twitter

GeoS'11 Proceedings of the 4th international conference on GeoSpatial semantics
Crowd-based urban characterization: extracting crowd behavioral patterns in urban areas from Twitter

Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Location-Based Social Networks
Crowd-sourced urban life monitoring: urban area characterization based crowd behavioral patterns from Twitter

Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
Urban area characterization based on crowd behavioral lifelogs over Twitter

Personal and Ubiquitous Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data mining has attracted a lot of research efforts during the past decade. However, little work has been reported on the efficiency of supporting a large number of users who issue different data mining queries periodically when there are new needs and when data is updated. Our work is motivated by the fact that the pattern-growth method is one of the most efficient methods for frequent pattern mining which constructs an initial tree and mines frequent patterns on top of the tree. In this paper, we present a data mining proxy approach that can reduce the I/O costs to construct an initial tree by utilizing the trees that have already been resident in memory. The tree we construct is the smallest for a given data mining query. In addition, our proxy approach can also reduce CPU cost in mining patterns, because the cost of mining relies on the sizes of trees. The focus of the work is to construct an initial tree efficiently. We propose three tree operations to construct a tree. With a unique coding scheme, we can efficiently project subtrees from on-disk trees or in-memory trees. Our performance study indicated that the data mining proxy significantly reduces the I/O cost to construct trees and CPU cost to mine patterns over the trees constructed.