ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
CustoMalloc: efficient synthesized memory allocators
Software—Practice & Experience
An effective hash-based algorithm for mining association rules
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Beyond market baskets: generalizing association rules to correlations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
An analysis of database workload performance on simultaneous multithreaded processors
Proceedings of the 25th annual international symposium on Computer architecture
Efficient mining of emerging patterns: discovering trends and differences
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Making B+- trees cache conscious in main memory
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Improving index performance through prefetching
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Real world performance of association rule algorithms
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Parallel data mining for association rules on shared memory systems
Knowledge and Information Systems
Discovery of Frequent Episodes in Event Sequences
Data Mining and Knowledge Discovery
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases
Proceedings of the 17th International Conference on Data Engineering
Efficiently Mining Maximal Frequent Itemsets
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Scalable Techniques for Mining Causal Structures
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Cache Conscious Indexing for Decision-Support in Main Memory
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
DBMSs on a Modern Processor: Where Does Time Go?
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Cache Conscious Algorithms for Relational Query Processing
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Efficient Mining of Partial Periodic Patterns in Time Series Database
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Memory issues in frequent itemset mining
Proceedings of the 2004 ACM symposium on Applied computing
Improving Hash Join Performance through Prefetching
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
A characterization of data mining algorithms on a modern processor
DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
A trie-based APRIORI implementation for mining frequent item sequences
Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
Out-of-core frequent pattern mining on a commodity PC
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Summarizing itemset patterns using probabilistic models
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 2006 workshop on Memory system performance and correctness
TRIPS and TIDES: new algorithms for tree mining
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Toward terabyte pattern mining: an architecture-conscious solution
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimization of frequent itemset mining on multiple-core processor
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A general framework for improving query processing performance on multi-level memory hierarchies
DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
Tree model guided candidate generation for mining frequent subtrees from XML documents
ACM Transactions on Knowledge Discovery from Data (TKDD)
The VLDB Journal — The International Journal on Very Large Data Bases
Direct mining of discriminative and essential frequent patterns via model-based search tree
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
An integrated, generic approach to pattern mining: data mining template library
Data Mining and Knowledge Discovery
Frequent itemset mining on graphics processors
Proceedings of the Fifth International Workshop on Data Management on New Hardware
Cache-conscious buffering for database operators with state
Proceedings of the Fifth International Workshop on Data Management on New Hardware
Tree-traversal orientation analysis
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
BAR: bitmap-based association rule: an implementation and its optimizations
Proceedings of the 7th International Conference on Advances in Mobile Computing and Multimedia
Porting decision tree algorithms to multicore using fastflow
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
I/O conscious algorithm design and systems support for data analysis on emerging architectures
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Memory-efficient frequent-itemset mining
Proceedings of the 14th International Conference on Extending Database Technology
Analyzing the effects of hyperthreading on the performance of data management systems
International Journal of Parallel Programming
CCDR-PAID: more efficient cache-conscious PAID algorithm by data reconstruction
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Parallel approaches to machine learning-A comprehensive survey
Journal of Parallel and Distributed Computing
Para Miner: a generic pattern mining algorithm for multi-core architectures
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
In this paper, we examine the performance of frequent pattern mining algorithms on a modern processor. A detailed performance study reveals that even the best frequent pattern mining implementations, with highly efficient memory managers, still grossly under-utilize a modern processor. The primary performance bottlenecks are poor data locality and low instruction level parallelism (ILP). We propose a cache-conscious prefix tree to address this problem. The resulting tree improves spatial locality and also enhances the benefits from hardware cache line prefetching. Furthermore, the design of this data structure allows the use of a novel tiling strategy to improve temporal locality. The result is an overall speedup of up to 3.2 when compared with state-of-the-art implementations. We then show how these algorithms can be improved further by realizing a non-naive thread-based decomposition that targets simultaneously multi-threaded processors. A key aspect of this decomposition is to ensure cache re-use between threads that are co-scheduled at a fine granularity. This optimization affords an additional speedup of 50%, resulting in an overall speedup of up to 4.8. To the best of our knowledge, this effort is the first to target cache-conscious data mining.