Fast vertical mining using diffsets

Authors:
Mohammed J. Zaki;Karam Gouda
Affiliations:
Rensselaer Polytechnic Institute, Troy, NY;Faculty of Science, Benha, Egypt
Venue:
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2003

Citing 21
Cited 97

An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Fast discovery of association rules

Advances in knowledge discovery and data mining
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Integrating association rule mining with relational database systems: alternatives and implications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Turbo-charging vertical mining of large databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Generating non-redundant association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Depth first generation of long patterns

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: concepts and techniques

Data mining: concepts and techniques
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
Pincer Search: A New Algorithm for Discovering the Maximum Frequent Set

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Mining Association Rules: Anti-Skew Algorithms

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Discovering All Most Specific Sentences by Randomized Algorithms

ICDT '97 Proceedings of the 6th International Conference on Database Theory
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Organization and Access for Efficient Data Mining

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
New Algorithms for Fast Discovery of Association Rules

New Algorithms for Fast Discovery of Association Rules

Statistical properties of transactional databases

Proceedings of the 2004 ACM symposium on Applied computing
Memory issues in frequent itemset mining

Proceedings of the 2004 ACM symposium on Applied computing
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Advances in frequent itemset mining implementations: report on FIMI'03

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Fast mining of spatial collocations

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Iterative Projected Clustering by Subspace Mining

IEEE Transactions on Knowledge and Data Engineering
Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure

IEEE Transactions on Knowledge and Data Engineering
Finding frequent itemsets by transaction mapping

Proceedings of the 2005 ACM symposium on Applied computing
Mining closed relational graphs with connectivity constraints

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Pattern lattice traversal by selective jumps

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Fast Algorithms for Frequent Itemset Mining Using FP-Trees

IEEE Transactions on Knowledge and Data Engineering
Implementing BDFS(b) with Diff-Sets for Real-Time Frequent Pattern Mining in Dense Datasets - First Findings

UDM '05 Proceedings of the International Workshop on Ubiquitous Data Management
GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets

Data Mining and Knowledge Discovery
Finding Frequent Patterns in a Large Sparse Graph*

Data Mining and Knowledge Discovery
Fast and Memory Efficient Mining of Frequent Closed Itemsets

IEEE Transactions on Knowledge and Data Engineering
Graph indexing based on discriminative frequent structure analysis

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
A Transaction Mapping Algorithm for Frequent Itemsets Mining

IEEE Transactions on Knowledge and Data Engineering
Periodic association mining in a geospatial decision support system

dg.o '06 Proceedings of the 2006 international conference on Digital government research
Mining quantitative correlated patterns using an information-theoretic approach

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Horn axiomatizations for sequential data

Theoretical Computer Science
Frequent Subtree Mining - An Overview

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Parameter optimized, vertical, nearest-neighbor-vote and boundary-based classification

ACM SIGKDD Explorations Newsletter
A greedy classification algorithm based on association rule

Applied Soft Computing
Mining frequent tree-like patterns in large datasets

Data & Knowledge Engineering
Mining itemsets in the presence of missing values

Proceedings of the 2007 ACM symposium on Applied computing
A review of associative classification mining

The Knowledge Engineering Review
Toward supporting real-time mining for data residing on enterprise systems

Expert Systems with Applications: An International Journal
Mining high utility itemsets in large high dimensional data

Proceedings of the 1st international conference on Forensic applications and techniques in telecommunications, information, and multimedia and workshop
A Contribution to the Use of Decision Diagrams for Loading and Mining Transaction Databases

Fundamenta Informaticae - Special issue ISMIS'05
Tree model guided candidate generation for mining frequent subtrees from XML documents

ACM Transactions on Knowledge Discovery from Data (TKDD)
An efficient algorithm for mining closed inter-transaction itemsets

Data & Knowledge Engineering
A data mining proxy approach for efficient frequent itemset mining

The VLDB Journal — The International Journal on Very Large Data Bases
Correlated pattern mining in quantitative databases

ACM Transactions on Database Systems (TODS)
Index-BitTableFI: An improved algorithm for mining frequent itemsets

Knowledge-Based Systems
An information-theoretic approach to quantitative association rule mining

Knowledge and Information Systems
A Linear Delay Algorithm for Building Concept Lattices

CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
An integrated, generic approach to pattern mining: data mining template library

Data Mining and Knowledge Discovery
An effective algorithm for mining 3-clusters in vertically partitioned data

Proceedings of the 17th ACM conference on Information and knowledge management
Mining long high utility itemsets in transaction databases

WSEAS Transactions on Information Science and Applications
Identifying appropriate methodologies and strategies for vertical mining with incomplete data

WSEAS Transactions on Computers
Efficient mining of interesting weighted patterns from directed graph traversals

Integrated Computer-Aided Engineering
On pushing weight constraints deeply into frequent itemset mining

Intelligent Data Analysis
Vertical mining with incomplete data

MAMECTIS'08 Proceedings of the 10th WSEAS international conference on Mathematical methods, computational techniques and intelligent systems
Mining Frequent Patterns from Network Data Flow

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Robust and distributed top-n frequent-pattern mining with SAP BW accelerator

Proceedings of the VLDB Endowment
Mining frequent closed patterns in pointset databases

Information Systems
Network traffic monitoring based on mining frequent patterns

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
BISC: A bitmap itemset support counting approach for efficient frequent itemset mining

ACM Transactions on Knowledge Discovery from Data (TKDD)
Mining frequent patterns from network flows for monitoring network

Expert Systems with Applications: An International Journal
Cohesion: A concept and framework for confident association discovery with potential application in microarray mining

Applied Soft Computing
Mining informative rule set for prediction over a sliding window

ACIIDS'10 Proceedings of the Second international conference on Intelligent information and database systems: Part II
Generalized association rule mining using an efficient data structure

Expert Systems with Applications: An International Journal
An efficient strategy for mining high utility itemsets

International Journal of Intelligent Information and Database Systems
Mining frequent itemsets from multidimensional databases

ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part I
Algorithm for low-variance biclusters to identify coregulation modules in sequencing datasets

Proceedings of the Tenth International Workshop on Data Mining in Bioinformatics
Itemset mining: A constraint programming perspective

Artificial Intelligence
COSINE: a vertical group difference approach to contrast set mining

Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
Item set mining based on cover similarity

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
GENCCS: a correlated group difference approach to contrast set mining

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Fast mining erasable itemsets using NC_sets

Expert Systems with Applications: An International Journal
Controlling false positives in association rule mining

Proceedings of the VLDB Endowment
A new approach to generate frequent patterns from enterprise databases

ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
IMB3-Miner: mining induced/embedded subtrees by constraining the level of embedding

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Mining quantitative maximal hyperclique patterns: a summary of results

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
An efficient real-time frequent pattern mining technique using diff-sets

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part III
An efficient approach for interactive mining of frequent itemsets

WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
A fast algorithm for maintenance of association rules in incremental databases

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Discovering partial periodic sequential association rules with time lag in multiple sequences for prediction

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Estimation of the density of datasets with decision diagrams

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
An efficient compression technique for frequent itemset generation in association rule mining

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
A dynamic layout of sliding window for frequent itemset mining over data streams

Journal of Systems and Software
The parameterized complexity of enumerating frequent itemsets

IWPEC'06 Proceedings of the Second international conference on Parameterized and Exact Computation
Efficient pattern mining of uncertain data with sampling

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Profile association rule mining using tests of hypotheses without support threshold

ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part IV
An improvement for dEclat algorithm

Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
Efficient mining top-k regular-frequent itemset using compressed tidsets

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
A universal toolkit for cryptographically secure privacy-preserving data mining

PAISI'12 Proceedings of the 2012 Pacific Asia conference on Intelligence and Security Informatics
An artificial immune system approach to associative classification

ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I
Mop: An Efficient Algorithm for Mining Frequent Pattern with Subtree Traversing

Fundamenta Informaticae
Cover similarity based item set mining

Bisociative Knowledge Discovery
ML-DS: a novel deterministic sampling algorithm for association rules mining

ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects
A Contribution to the Use of Decision Diagrams for Loading and Mining Transaction Databases

Fundamenta Informaticae - Special issue ISMIS'05
Frequent Subtree Mining - An Overview

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Mining Induced/Embedded Subtrees using the Level of Embedding Constraint

Fundamenta Informaticae
Frequent item set mining

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
A new method for mining Frequent Weighted Itemsets based on WIT-trees

Expert Systems with Applications: An International Journal
A fast algorithm for frequent itemset mining using Patricia* structures

DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
AC-CS: an immune-inspired associative classification algorithm

ICARIS'12 Proceedings of the 11th international conference on Artificial Immune Systems
Frequent links: an approach that combines attributes and structure for extracting frequent patterns in social networks

ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Hierarchical clustering of XML documents focused on structural components

Data & Knowledge Engineering
EFP-M2: efficient model for mining frequent patterns in transactional database

ICCCI'12 Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part II
Fast mining Top-Rank-k frequent patterns by using Node-lists

Expert Systems with Applications: An International Journal
MEI: An efficient algorithm for mining erasable itemsets

Engineering Applications of Artificial Intelligence
Accelerating frequent itemset mining on graphics processing units

The Journal of Supercomputing
A time-efficient breadth-first level-wise lattice-traversal algorithm to discover rare itemsets

Data Mining and Knowledge Discovery
Mining high utility itemsets by dynamically pruning the tree structure

Applied Intelligence
Mining low-variance biclusters to discover coregulation modules in sequencing datasets

Scientific Programming - Biological Knowledge Discovery and Data Mining

Quantified Score

Hi-index	0.01

Visualization

Abstract

A number of vertical mining algorithms have been proposed recently for association mining, which have shown to be very effective and usually outperform horizontal approaches. The main advantage of the vertical format is support for fast frequency counting via intersection operations on transaction ids (tids) and automatic pruning of irrelevant data. The main problem with these approaches is when intermediate results of vertical tid lists become too large for memory, thus affecting the algorithm scalability.In this paper we present a novel vertical data representation called Diffset, that only keeps track of differences in the tids of a candidate pattern from its generating frequent patterns. We show that diffsets drastically cut down the size of memory required to store intermediate results. We show how diffsets, when incorporated into previous vertical mining methods, increase the performance significantly.