Mining tree-structured data on multicore systems

  • Authors:
  • Shirish Tatikonda;Srinivasan Parthasarathy

  • Affiliations:
  • The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH

  • Venue:
  • Proceedings of the VLDB Endowment
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mining frequent subtrees in a database of rooted and labeled trees is an important problem in many domains, ranging from phylogenetic analysis to biochemistry and from linguistic parsing to XML data analysis. In this work we revisit this problem and develop an architecture conscious solution targeting emerging multicore systems. Specifically we identify a sequence of memory related optimizations that significantly improve the spatial and temporal locality of a state-of-the-art sequential algorithm -- alleviating the effects of memory latency. Additionally, these optimizations are shown to reduce the pressure on the front-side bus, an important consideration in the context of large-scale multicore architectures. We then demonstrate that these optimizations while necessary are not sufficient for efficient parallelization on multicores, primarily due to parametric and data-driven factors which make load balancing a significant challenge. To address this challenge, we present a methodology that adaptively and automatically modulates the type and granularity of the work being shared among different cores. The resulting algorithm achieves near perfect parallel efficiency on up to 16 processors on challenging real world applications. The optimizations we present have general purpose utility and a key out-come is the development of a general purpose scheduling service for moldable task scheduling on emerging multicore systems.