iFlow (poster session): a data streaming application framework based on a uniform abstraction
OOPSLA '00 Addendum to the 2000 proceedings of the conference on Object-oriented programming, systems, languages, and applications (Addendum)
Efficient Manipulation of Large Datasets on Heterogeneous Storage Systems
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Active Proxy-G: optimizing the query execution process in the grid
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Executing multiple pipelined data analysis operations in the grid
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Grid -Based Parallel Data Streaming implemented for the Gyrokinetic Toroidal Code
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Exploiting Inter-File Access Patterns Using Multi-Collective I/O
FAST '02 Proceedings of the 1st USENIX Conference on File and Storage Technologies
Multicollective I/O: A technique for exploiting inter-file access patterns
ACM Transactions on Storage (TOS)
Data-intensive computing for competent genetic algorithms: a pilot study using meandre
Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Sockets direct protocol for hybrid network stacks: a case study with iWARP over 10G Ethernet
HiPC'08 Proceedings of the 15th international conference on High performance computing
A general approach to data-intensive computing using the Meandre component-based framework
Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric Science
Exploiting inter-file access patterns using multi-collective I/O
FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
JSAI-isAI'10 Proceedings of the 2010 international conference on New Frontiers in Artificial Intelligence
Hi-index | 0.00 |
Applications that use collections of very large, distributed datasets have become an increasingly important part of science and engineering. With high performance wide-area networks become more pervasive, there is interest in making collective use of distributed computational and data resources. Recent work has converged to the notion of the Grid, which attempts to uniformly present a heterogeneous collection of distributed resources. Current Grid research covers many areas from low-level infrastructure issues to high-level application concerns. However, providing support for efficient exploration and processing of very large scientific datasets stored in distributed archival storage systems remains a challenging research issue.We have initiated an effort that focuses on developing efficient data-intensive applications in a Grid environment. In this paper, we present a framework, called filter-stream programming that represents the processing units of a data-intensive application as a set of filters, which are designed to be efficient in their use of memory and scratch space. We describe a prototype infrastructure that supports execution of applications using the proposed framework. We present the implementation of two applications using the filter-stream programming framework, and discuss experimental results demonstrating the effects of heterogeneous resources on application performance.