Data-intensive architecture for scientific knowledge discovery

Authors:
Malcolm Atkinson;Chee Sun Liew;Michelle Galea;Paul Martin;Amrey Krause;Adrian Mouat;Oscar Corcho;David Snelling
Affiliations:
School of Informatics, University of Edinburgh, Edinburgh, UK EH8 9AB;Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia 50603;School of Informatics, University of Edinburgh, Edinburgh, UK EH8 9AB;School of Informatics, University of Edinburgh, Edinburgh, UK EH8 9AB;EPCC, University of Edinburgh, Edinburgh, UK EH9 3JZ;EPCC, University of Edinburgh, Edinburgh, UK EH9 3JZ;Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Boadilla del Monte, Spain 28660;Fujitsu Laboratories of Europe Limited, Hayes, UK UB4 8FE
Venue:
Distributed and Parallel Databases
Year:
2012

Citing 19
Cited 4

StreamIt: A Language for Streaming Applications

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Taverna: a tool for the composition and enactment of bioinformatics workflows

Bioinformatics
Scientific data management in the coming decade

ACM SIGMOD Record
Scientific workflow management and the Kepler system: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science
Pegasus: A framework for mapping complex scientific workflows onto distributed systems

Scientific Programming
Interpreting the data: Parallel analysis with Sawzall

Scientific Programming - Dynamic Grids and Worldwide Computing
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SPADE: the system s declarative stream processing engine

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Data-Intensive Computing in the 21st Century

Computer
Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
The Trident Scientific Workflow Workbench

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Workflows and e-Science: An overview of workflow system features and capabilities

Future Generation Computer Systems
The design and realisation of the Experimentmy Virtual Research Environment for social sharing of workflows

Future Generation Computer Systems
A generic parallel processing model for facilitating data mining and integration

Parallel Computing
Semantics and optimization of the SPARQL 1.1 federation extension

ESWC'11 Proceedings of the 8th extended semantic web conference on The semanic web: research and applications - Volume Part II
How will astronomy archives survive the data tsunami?

Communications of the ACM
Swift: A language for distributed parallel scripting

Parallel Computing

Special issue for data intensive eScience

Distributed and Parallel Databases
Provenance for seismological processing pipelines in a distributed streaming workflow

Proceedings of the Joint EDBT/ICDT 2013 Workshops
Semantics and provenance for processing element composition in dispel workflows

WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science
The demand for consistent web-based workflow editors

WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a data-intensive architecture that demonstrates the ability to support applications from a wide range of application domains, and support the different types of users involved in defining, designing and executing data-intensive processing tasks. The prototype architecture is introduced, and the pivotal role of DISPEL as a canonical language is explained. The architecture promotes the exploration and exploitation of distributed and heterogeneous data and spans the complete knowledge discovery process, from data preparation, to analysis, to evaluation and reiteration. The architecture evaluation included large-scale applications from astronomy, cosmology, hydrology, functional genetics, imaging processing and seismology.