I/O conscious algorithm design and systems support for data analysis on emerging architectures

Authors:
G. Buehrer;A. Ghoting;Xi Zhang;S. Tatikonda;S. Parthasarathy;T. Kurc;J. Saltz
Affiliations:
The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH
Venue:
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Year:
2006

Citing 26
Cited 0

Parallel database systems: the future of high performance database systems

Communications of the ACM
C4.5: programs for machine learning

C4.5: programs for machine learning
Server-directed collective I/O in Panda

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The Vesta parallel file system

ACM Transactions on Computer Systems (TOCS)
The galley parallel file system

ICS '96 Proceedings of the 10th international conference on Supercomputing
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Parallel I/O for high performance computing

Parallel I/O for high performance computing
Real world performance of association rule algorithms

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Parallel data mining for association rules on shared memory systems

Knowledge and Information Systems
MPI-IO/GPFS, an optimized implementation of MPI-IO on top of GPFS

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Passion: Optimized I/O for Parallel Applications

Computer
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Efficient Progressive Sampling for Association Rules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
MS-I/O: A Distributed Multi-Storage I/O System

CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
Mining distance-based outliers in near linear time with randomization and a simple pruning rule

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Frequent Itemsets from Secondary Memory

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Cache-conscious frequent pattern mining on a modern processor

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Design of a next generation sampling service for large scale data analysis applications

Proceedings of the 19th annual international conference on Supercomputing
A characterization of data mining algorithms on a modern processor

DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
Disk-directed I/O for MIMD multiprocessors

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
PVFS: a parallel file system for linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Design and analysis of a multi-dimensional data sampling service for large scale data analysis applications

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Advances in data collection and storage technologies have given rise to large dynamic data stores. In order to effectively manage and mine such stores on modern and emerging architectures, one must consider both designing effective middleware support and re-architecting algorithms, to derive performance that commensurates with technological advances. In this article, we present a topdown view of how one can achieve this goal for next generation data analysis centers. Specifically, we present a case study on frequent pattern algorithms, and show how such algorithms can be re-structured to be cache, memory and I/O conscious. Furthermore, motivated by such algorithms, we present a services oriented middleware framework for the derivation of high performance on next generation architectures.