The grid: blueprint for a new computing infrastructure
The grid: blueprint for a new computing infrastructure
The grid
Designing and mining multi-terabyte astronomy archives: the Sloan Digital Sky Survey
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Tracing the lineage of view data in a warehousing environment
ACM Transactions on Database Systems (TODS)
Communications of the ACM
ZOO: A Desktop Experiment Management Environment
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation
SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
GriPhyN and LIGO, Building a Virtual Data Grid for Gravitational Wave Scientists
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Condor-G: A Computation Management Agent for Multi-Institutional Grids
HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
File and Object Replication in Data Grids
HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
Supporting Fine-Grained Data Lineage in a Database Visualization
Supporting Fine-Grained Data Lineage in a Database Visualization
Data integration in a bandwidth-rich world
Communications of the ACM - Blueprint for the future of high-performance networking
Grid resource management
Grid middleware services for virtual data discovery, composition, and integration
MGC '04 Proceedings of the 2nd workshop on Middleware for grid computing
Use of PVFS for Efficient Execution of Jobs with Pipeline-Shared I/O
GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
Parallel scheduling of complex dags under uncertainty
Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
Toward a Theory for Scheduling Dags in Internet-Based Computing
IEEE Transactions on Computers
Dynamic partner identification in mobile agent-based distributed job workflow execution
Journal of Parallel and Distributed Computing
GridDB: a data-centric overlay for scientific grids
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Execution coordination in mobile agent-based distributed job workflow execution
Journal of Systems Architecture: the EUROMICRO Journal
Artemis: integrating scientific data on the grid
IAAI'04 Proceedings of the 16th conference on Innovative applications of artifical intelligence
Do You Know Where Your Data's Been? --- Tamper-Evident Database Provenance
SDM '09 Proceedings of the 6th VLDB Workshop on Secure Data Management
Discuss of a distributed structure in access network
IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
Research on VLAN technology in L3 switch
IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
Efficient storage and temporal query evaluation in hierarchical data archiving systems
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Applications development for the computational grid
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Using relative costs in workflow scheduling to cope with input data uncertainty
Proceedings of the 10th International Workshop on Middleware for Grids, Clouds and e-Science
A survey of task mapping on production grids
ACM Computing Surveys (CSUR)
Stochastic DAG scheduling using a Monte Carlo approach
Journal of Parallel and Distributed Computing
Budget-Deadline Constrained Workflow Planning for Admission Control
Journal of Grid Computing
Hi-index | 0.00 |
In many scientific disciplines -- especially long running, data-intensive collaborations -- it is important to track all aspects of data capture, production, transformation, and analysis. In principle, one can then audit, validate, reproduce, and/or re-run with corrections various data transformations. We have recently proposed and prototyped the Chimera virtual data system, a new database-driven approach to this problem. We present here a major application study in which we apply Chimera to a challenging data analysis problem: the identification of galaxy clusters within the Sloan Digital Sky Survey. We describe the problem, its computational procedures, and the use of Chimera to plan and orchestrate the workflow of thousands of tasks on a data grid comprising hundreds of computers. This experience suggests that a general set of tools can indeed enhance the accuracy and productivity of scientific data reduction and that further development and application of this paradigm will offer great value.