Cache-conscious frequent pattern mining on modern and emerging processors

Authors:
Amol Ghoting;Gregory Buehrer;Srinivasan Parthasarathy;Daehyun Kim;Anthony Nguyen;Yen-Kuang Chen;Pradeep Dubey
Affiliations:
Department of Computer Science and Engineering, The Ohio State University, USA;Department of Computer Science and Engineering, The Ohio State University, USA;Department of Computer Science and Engineering, The Ohio State University, USA;Applications Research Laboratory, Corporate Technology Group, Intel Corporation, USA;Applications Research Laboratory, Corporate Technology Group, Intel Corporation, USA;Applications Research Laboratory, Corporate Technology Group, Intel Corporation, USA;Applications Research Laboratory, Corporate Technology Group, Intel Corporation, USA
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2007

Citing 36
Cited 8

Software prefetching

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
CustoMalloc: efficient synthesized memory allocators

Software—Practice & Experience
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
An analysis of database workload performance on simultaneous multithreaded processors

Proceedings of the 25th annual international symposium on Computer architecture
Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Making B+- trees cache conscious in main memory

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Improving index performance through prefetching

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Real world performance of association rule algorithms

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Parallel data mining for association rules on shared memory systems

Knowledge and Information Systems
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Scalable Parallel Data Mining for Association Rules

IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Scalable Techniques for Mining Causal Structures

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Cache Conscious Indexing for Decision-Support in Main Memory

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
DBMSs on a Modern Processor: Where Does Time Go?

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Weaving Relations for Cache Performance

Proceedings of the 27th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Cache Conscious Algorithms for Relational Query Processing

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Cache-oblivious B-trees

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Performance and Memory-Access Characterization of Data Mining Applications

WWC '98 Proceedings of the Workload Characterization: Methodology and Case Studies
Memory Characterization of a Parallel Data Mining Workload

WWC '98 Proceedings of the Workload Characterization: Methodology and Case Studies
Efficient Mining of Partial Periodic Patterns in Time Series Database

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Improving Hash Join Performance through Prefetching

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Improving database performance on simultaneous multithreading processors

VLDB '05 Proceedings of the 31st international conference on Very large data bases
A characterization of data mining algorithms on a modern processor

DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware

PFunc: modern task parallelism for modern high performance computing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Cache conscious trees: how do they perform on contemporary commodity microprocessors?

ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part I
Fast and compact hash tables for integer keys

ACSC '09 Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91
Engineering scalable, cache and space efficient tries for strings

The VLDB Journal — The International Journal on Very Large Data Bases
Redesigning the string hash table, burst trie, and BST to exploit cache

Journal of Experimental Algorithmics (JEA)
A parallel algorithm for computing borders

Proceedings of the 20th ACM international conference on Information and knowledge management
Cache conscious trees on modern microprocessors

Proceedings of the 4th International Conference on Uniquitous Information Management and Communication
OLTP in wonderland: where do cache misses come from in major OLTP components?

Proceedings of the Ninth International Workshop on Data Management on New Hardware

Quantified Score

Hi-index	0.00

Visualization

Abstract

Algorithms are typically designed to exploit the current state of the art in processor technology. However, as processor technology evolves, said algorithms are often unable to derive the maximum achievable performance on these modern architectures. In this paper, we examine the performance of frequent pattern mining algorithms on a modern processor. A detailed performance study reveals that even the best frequent pattern mining implementations, with highly efficient memory managers, still grossly under-utilize a modern processor. The primary performance bottlenecks are poor data locality and low instruction level parallelism (ILP). We propose a cache-conscious prefix tree to address this problem. The resulting tree improves spatial locality and also enhances the benefits from hardware cache line prefetching. Furthermore, the design of this data structure allows the use of path tiling, a novel tiling strategy, to improve temporal locality. The result is an overall speedup of up to 3.2 when compared with state of the art implementations. We then show how these algorithms can be improved further by realizing a non-naive thread-based decomposition that targets simultaneously multi-threaded processors (SMT). A key aspect of this decomposition is to ensure cache re-use between threads that are co-scheduled at a fine granularity. This optimization affords an additional speedup of 50%, resulting in an overall speedup of up to 4.8. The proposed optimizations also provide performance improvements on SMPs, and will most likely be beneficial on emerging processors.