Identifying the challenges for optimizing the process to achieve reproducible results in e-science applications

Authors:
Mohammad Rezwanul Huq;Andreas Wombacher;Peter M.G. Apers
Affiliations:
University of Twente, Enschede, Netherlands;University of Twente, Enschede, Netherlands;University of Twente, Enschede, Netherlands
Venue:
PIKM '10 Proceedings of the 3rd workshop on Ph.D. students in information and knowledge management
Year:
2010

Citing 13
Cited 2

Random sampling with a reservoir

ACM Transactions on Mathematical Software (TOMS)
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
New sampling-based summary statistics for improving approximate query answers

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
NiagaraCQ: a scalable continuous query system for Internet databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Continuously adaptive continuous queries over streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
TelegraphCQ: continuous dataflow processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Aurora: a new model and architecture for data stream management

The VLDB Journal — The International Journal on Very Large Data Bases
PSoup: a system for streaming queries over streaming data

The VLDB Journal — The International Journal on Very Large Data Bases
Provenance in databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
SPADE: the system s declarative stream processing engine

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Implementing reliable event streams in large systems via distributed data flows and recursive delegation

Proceedings of the Third ACM International Conference on Distributed Event-Based Systems
Facilitating fine grained data provenance using temporal data model

Proceedings of the Seventh International Workshop on Data Management for Sensor Networks

PIKM 2010: ACM workshop for ph.d. students in information and knowledge management

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Emerging multidisciplinary research across database management systems

ACM SIGMOD Record

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the major requirements for e-science applications handling sensor data, is reproducibility of results. Several optimization and scalability problems exist where the reproducibility of results remains guaranteed. Firstly, various data streams need to be coordinated to optimize the accuracy and processing of the results. Secondly, because of the high volume of streaming data and a series of processing steps to be performed on that data, demand for disk space may grow unacceptably high. Lastly, reproducibility in a decentralized scenario may be difficult to achieve because of data replication. This paper introduces and addresses these challenges which arise for optimizing the process of achieving reproducibility of results.