Mining tree-structured data on multicore systems

Authors:
Shirish Tatikonda;Srinivasan Parthasarathy
Affiliations:
The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH
Venue:
Proceedings of the VLDB Endowment
Year:
2009

Citing 36
Cited 5

Code generation using tree matching and dynamic programming

ACM Transactions on Programming Languages and Systems (TOPLAS)
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Kaikoura tree theorems: computing the maximum agreement subtree

Information Processing Letters
Parallel algorithms for hierarchical clustering

Parallel Computing
An Optimal Algorithm for Global Termination Detection in Shared-Memory Asynchronous Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
Finding patterns in time series: a dynamic programming approach

Advances in knowledge discovery and data mining
The String-to-String Correction Problem

Journal of the ACM (JACM)
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Parallel and Distributed Association Mining: A Survey

IEEE Concurrency
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Web usage mining: discovery and applications of usage patterns from Web data

ACM SIGKDD Explorations Newsletter
Parallel Classification for Data Mining on Shared-Memory Multiprocessors

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Clone Detection Using Abstract Syntax Trees

ICSM '98 Proceedings of the International Conference on Software Maintenance
Computing Similarity Between RNA Secondary Structures

INTSYS '98 Proceedings of the IEEE International Joint Symposia on Intelligence and Systems
Efficient Data Mining for Maximal Frequent Subtrees

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
XRules: an effective structural classifier for XML data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Frequent free tree discovery in graph data

Proceedings of the 2004 ACM symposium on Applied computing
PRIX: Indexing And Querying XML Using Prüfer Sequences

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Unordered Tree Mining with Applications to Phylogeny

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
DRYADE: A New Approach for Discovering Closed Frequent Trees in Heterogeneous Tree Databases

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Finding hot query patterns over an XQuery stream

The VLDB Journal — The International Journal on Very Large Data Bases
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
Efficient Mining of High Branching Factor Attribute Trees

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
TreeRank: a similarity measure for nearest neighbor searching in phylogenetic database

SSDBM '03 Proceedings of the 15th International Conference on Scientific and Statistical Database Management
Out-of-core frequent pattern mining on a commodity PC

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
TRIPS and TIDES: new algorithms for tree mining

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Adaptive Parallel Graph Mining for CMP Architectures

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Frequent Subtree Mining - An Overview

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Enabling scalability and performance in a large scale CMP environment

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
LCS-TRIM: dynamic programming meets XML indexing and querying

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Mining significant tree patterns in carbohydrate sugar chains

Bioinformatics
Main-memory scan sharing for multi-core CPUs

Proceedings of the VLDB Endowment
Tree-bank grammars

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
IMB3-Miner: mining induced/embedded subtrees by constraining the level of embedding

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Ten thousand SQLs: parallel keyword queries computing

Proceedings of the VLDB Endowment
Parallel skyline computation on multicore architectures

Information Systems
Posting list intersection on multicore architectures

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
PGP-mc: towards a multicore parallel approach for mining gradual patterns

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Para Miner: a generic pattern mining algorithm for multi-core architectures

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining frequent subtrees in a database of rooted and labeled trees is an important problem in many domains, ranging from phylogenetic analysis to biochemistry and from linguistic parsing to XML data analysis. In this work we revisit this problem and develop an architecture conscious solution targeting emerging multicore systems. Specifically we identify a sequence of memory related optimizations that significantly improve the spatial and temporal locality of a state-of-the-art sequential algorithm -- alleviating the effects of memory latency. Additionally, these optimizations are shown to reduce the pressure on the front-side bus, an important consideration in the context of large-scale multicore architectures. We then demonstrate that these optimizations while necessary are not sufficient for efficient parallelization on multicores, primarily due to parametric and data-driven factors which make load balancing a significant challenge. To address this challenge, we present a methodology that adaptively and automatically modulates the type and granularity of the work being shared among different cores. The resulting algorithm achieves near perfect parallel efficiency on up to 16 processors on challenging real world applications. The optimizations we present have general purpose utility and a key out-come is the development of a general purpose scheduling service for moldable task scheduling on emerging multicore systems.