Code generation using tree matching and dynamic programming
ACM Transactions on Programming Languages and Systems (TOPLAS)
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Kaikoura tree theorems: computing the maximum agreement subtree
Information Processing Letters
Parallel algorithms for hierarchical clustering
Parallel Computing
IEEE Transactions on Parallel and Distributed Systems
Finding patterns in time series: a dynamic programming approach
Advances in knowledge discovery and data mining
The String-to-String Correction Problem
Journal of the ACM (JACM)
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Parallel and Distributed Association Mining: A Survey
IEEE Concurrency
gSpan: Graph-Based Substructure Pattern Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Web usage mining: discovery and applications of usage patterns from Web data
ACM SIGKDD Explorations Newsletter
Parallel Classification for Data Mining on Shared-Memory Multiprocessors
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Clone Detection Using Abstract Syntax Trees
ICSM '98 Proceedings of the International Conference on Software Maintenance
Computing Similarity Between RNA Secondary Structures
INTSYS '98 Proceedings of the IEEE International Joint Symposia on Intelligence and Systems
Efficient Data Mining for Maximal Frequent Subtrees
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
XRules: an effective structural classifier for XML data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Frequent free tree discovery in graph data
Proceedings of the 2004 ACM symposium on Applied computing
PRIX: Indexing And Querying XML Using Prüfer Sequences
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Unordered Tree Mining with Applications to Phylogeny
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
DRYADE: A New Approach for Discovering Closed Frequent Trees in Heterogeneous Tree Databases
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Finding hot query patterns over an XQuery stream
The VLDB Journal — The International Journal on Very Large Data Bases
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications
IEEE Transactions on Knowledge and Data Engineering
Efficient Mining of High Branching Factor Attribute Trees
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
TreeRank: a similarity measure for nearest neighbor searching in phylogenetic database
SSDBM '03 Proceedings of the 15th International Conference on Scientific and Statistical Database Management
Out-of-core frequent pattern mining on a commodity PC
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
TRIPS and TIDES: new algorithms for tree mining
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Adaptive Parallel Graph Mining for CMP Architectures
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Frequent Subtree Mining - An Overview
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Enabling scalability and performance in a large scale CMP environment
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
LCS-TRIM: dynamic programming meets XML indexing and querying
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Main-memory scan sharing for multi-core CPUs
Proceedings of the VLDB Endowment
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
IMB3-Miner: mining induced/embedded subtrees by constraining the level of embedding
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Ten thousand SQLs: parallel keyword queries computing
Proceedings of the VLDB Endowment
Parallel skyline computation on multicore architectures
Information Systems
Posting list intersection on multicore architectures
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
PGP-mc: towards a multicore parallel approach for mining gradual patterns
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Para Miner: a generic pattern mining algorithm for multi-core architectures
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Mining frequent subtrees in a database of rooted and labeled trees is an important problem in many domains, ranging from phylogenetic analysis to biochemistry and from linguistic parsing to XML data analysis. In this work we revisit this problem and develop an architecture conscious solution targeting emerging multicore systems. Specifically we identify a sequence of memory related optimizations that significantly improve the spatial and temporal locality of a state-of-the-art sequential algorithm -- alleviating the effects of memory latency. Additionally, these optimizations are shown to reduce the pressure on the front-side bus, an important consideration in the context of large-scale multicore architectures. We then demonstrate that these optimizations while necessary are not sufficient for efficient parallelization on multicores, primarily due to parametric and data-driven factors which make load balancing a significant challenge. To address this challenge, we present a methodology that adaptively and automatically modulates the type and granularity of the work being shared among different cores. The resulting algorithm achieves near perfect parallel efficiency on up to 16 processors on challenging real world applications. The optimizations we present have general purpose utility and a key out-come is the development of a general purpose scheduling service for moldable task scheduling on emerging multicore systems.