A distributed placement service for graph-structured and tree-structured data

Authors:
Gregory Buehrer;Srinivasan Parthasarathy;Shirish Tatikonda
Affiliations:
Microsoft, Redmond, CA, USA;The Ohio State University, Columbus, OH, USA;The Ohio State University, Columbus, OH, USA
Venue:
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Year:
2010

Citing 5
Cited 0

Min-wise independent permutations (extended abstract)

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Parallel data mining for association rules on shared memory systems

Knowledge and Information Systems
Toward terabyte pattern mining: an architecture-conscious solution

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
A scalable pattern mining approach to web graph compression with communities

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Effective data placement strategies can enhance the performance of data-intensive applications implemented on high end computing clusters. Such strategies can have a significant impact in localizing the computation, in minimizing synchronization (communication) costs, in enhancing reliability (via strategic replication policies), and in ensuring a balanced workload or enhancing the available bandwidth from massive storage devices (e.g. disk arrays). Existing work has largely targeted the placement of relatively simple data types or entities (e.g. elements, vectors, sets, and arrays). Here we investigate several hash-based distributed data placement methods targeting tree- and graph- structured data, and develop a locality enhancing placement service for large cluster systems. Target applications include the placement of a single large graph (e.g. Web graph), a single large tree (e.g. large XML file), a forest of graphs or trees (e.g. XML database) and other specialized graph data types - bi-partite (query-click graphs), directed acyclic graphs etc. We empirically evaluate our service by demonstrating its use in improving mining executions for pattern discovery, nearest neighbor searching, graph computations, and applications that combine link and content analysis.