Principles of distributed database systems
Principles of distributed database systems
ACM Computing Surveys (CSUR)
Giggle: a framework for constructing scalable replica location services
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Clustering Data Streams: Theory and Practice
IEEE Transactions on Knowledge and Data Engineering
The SDSC storage resource broker
CASCON '98 Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Stork: Making Data Placement a First Class Citizen in the Grid
ICDCS '04 Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04)
A grid service broker for scheduling distributed data-oriented applications on global grids
MGC '04 Proceedings of the 2nd workshop on Middleware for grid computing
Scheduling of scientific workflows in the ASKALON grid environment
ACM SIGMOD Record
A framework for reliable and efficient data placement in distributed computing systems
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
A taxonomy of Data Grids for distributed data sharing, management, and processing
ACM Computing Surveys (CSUR)
Scientific workflow management and the Kepler system: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Programming scientific and distributed workflow with Triana services: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
netWorker - Cloud computing: PC functions move onto the web
Peer-to-Peer Based Grid Workflow Runtime Environment of SwinDeW-G
E-SCIENCE '07 Proceedings of the Third IEEE International Conference on e-Science and Grid Computing
Overhead Analysis of Scientific Workflows in Grid Environments
IEEE Transactions on Parallel and Distributed Systems
Journal of Parallel and Distributed Computing
Data Management Challenges of Data-Intensive Scientific Workflows
CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
GridBatch: Cloud Computing for Large-Scale Data-Intensive Batch Applications
CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
SEA: A Striping-Based Energy-Aware Strategy for Data Placement in RAID-Structured Storage Systems
IEEE Transactions on Computers
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Optimizing workflow data footprint
Scientific Programming - Dynamic Computational Workflows: Discovery, Optimization and Scheduling
File grouping for scientific data management: lessons from experimenting with real traces
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Data mining using high performance data clouds: experimental studies using sector and sphere
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
BitDew: a programmable environment for large-scale data management and distribution
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
The cost of doing science on the cloud: the Montage example
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Compute and storage clouds using wide area high performance networks
Future Generation Computer Systems
HPCC '08 Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications
Scientific Cloud Computing: Early Definition and Experience
HPCC '08 Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications
An Algorithm in SwinDeW-C for Scheduling Transaction-Intensive Cost-Constrained Cloud Workflows
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
On the Use of Cloud Computing for Scientific Workflows
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Workflows and e-Science: An overview of workflow system features and capabilities
Future Generation Computer Systems
Data placement for scientific applications in distributed environments
GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
Future Generation Computer Systems
Reactive NUCA: near-optimal block placement and replication in distributed caches
Proceedings of the 36th annual international symposium on Computer architecture
Robust data placement in urgent computing environments
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
SwinDeW-a p2p-based decentralized workflow management system
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Journal of Parallel and Distributed Computing
Journal of Systems and Software
Integrated data placement and task assignment for scientific workflows in clouds
Proceedings of the fourth international workshop on Data-intensive distributed computing
GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
A data dependency based strategy for intermediate data storage in scientific cloud workflow systems
Concurrency and Computation: Practice & Experience
The retrieval of motion event by associations of temporal frequent pattern growth
Future Generation Computer Systems
International Journal of Security and Networks
Future Generation Computer Systems
Resource virtualization methodology for on-demand allocation in cloud computing systems
Service Oriented Computing and Applications
The Journal of Supercomputing
Hi-index | 0.00 |
In scientific cloud workflows, large amounts of application data need to be stored in distributed data centres. To effectively store these data, a data manager must intelligently select data centres in which these data will reside. This is, however, not the case for data which must have a fixed location. When one task needs several datasets located in different data centres, the movement of large volumes of data becomes a challenge. In this paper, we propose a matrix based k-means clustering strategy for data placement in scientific cloud workflows. The strategy contains two algorithms that group the existing datasets in k data centres during the workflow build-time stage, and dynamically clusters newly generated datasets to the most appropriate data centres-based on dependencies-during the runtime stage. Simulations show that our algorithm can effectively reduce data movement during the workflow's execution.