Efficient pattern mining on shared memory systems: implications for chip multiprocessor architectures

Authors:
Gregory Buehrer;Yen-Kuang Chen;Srinivasan Parthasarathy;Anthony Nguyen;Amol Ghoting;Daehyun Kim
Affiliations:
The Ohio State University, Columbus, OH;Intel Corporation, Santa Clara, CA;The Ohio State University, Columbus, OH;Intel Corporation, Santa Clara, CA;The Ohio State University, Columbus, OH;Intel Corporation, Santa Clara, CA
Venue:
Proceedings of the 2006 workshop on Memory system performance and correctness
Year:
2006

Citing 25
Cited 1

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Parallel sequence mining on shared-memory machines

Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
Approaches to parallel graph-based knowledge discovery

Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
Hoard: a scalable memory allocator for multithreaded applications

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Parallel data mining for association rules on shared memory systems

Knowledge and Information Systems
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Scalable Parallel Data Mining for Association Rules

IEEE Transactions on Knowledge and Data Engineering
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
LOGML: Log Markup Language for Web Usage Mining

WEBKDD '01 Revised Papers from the Third International Workshop on Mining Web Log Data Across All Customers Touch Points
Mining Molecular Fragments: Finding Relevant Substructures of Molecules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficient Discovery of Common Substructures in Macromolecules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Active data mining in a distributed setting

Active data mining in a distributed setting
Parallel algorithms for mining frequent structural motifs in scientific data

Proceedings of the 18th annual international conference on Supercomputing
A quickstart in frequent structure mining can make a difference

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A sampling-based framework for parallel data mining

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Cache-conscious frequent pattern mining on a modern processor

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Predicting protein folding pathways

Bioinformatics
Hardware-modulated parallelism in chip multiprocessors

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05

Performance Issues in Parallelizing Data-Intensive Applications on a Multi-core Cluster

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid

Quantified Score

Hi-index	0.00

Visualization

Abstract

Frequent pattern mining is a fundamental data mining process which has practical applications ranging from market basket data analysis to web link analysis. In this work, we show that state-of-the-art frequent pattern mining applications are inefficient when executing on a shared memory multiprocessor system, due primarily to poor utilization of the memory hierarchy. To improve the efficiency of these applications, we explore memory performance improvements, task partitioning strategies, and task queuing models designed to maximize the scalability of pattern mining on SMP systems. Empirically, we show that the proposed strategies afford significantly improved performance. We also discuss implications of this work in light of recent trends in micro-architecture design, particularly chip multiprocessors (CMPs).