ACM Transactions on Database Systems (TODS)
Scheduling real-time transactions
ACM SIGMOD Record - Special Issue on Real-Time Database Systems
Query optimization for parallel execution
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Red brick warehouse: a read-mostly RDBMS for open SMP platforms
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Efficient execution of multiple query workloads in data analysis applications
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Query Processing in Tertiary Memory Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Query Scheduling in Multi Query Optimization
IDEAS '01 Proceedings of the International Database Engineering & Applications Symposium
SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
Relational Joins for Data on Tertiary Storage
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Scheduling Algorithms
Estimating progress of execution for SQL queries
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Optimal File-Bundle Caching Algorithms for Data-Grids
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
QPipe: a simultaneously pipelined relational query engine
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Tycoon: An implementation of a distributed, market-based resource allocation system
Multiagent and Grid Systems
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Dynamo: amazon's highly available key-value store
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Cooperative scans: dynamic bandwidth sharing in a DBMS
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Scheduling shared scans of large data files
Proceedings of the VLDB Endowment
Building a high-level dataflow system on top of Map-Reduce: the Pig experience
Proceedings of the VLDB Endowment
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
A scalable, predictable join operator for highly concurrent data warehouses
Proceedings of the VLDB Endowment
Predictable performance for unpredictable workloads
Proceedings of the VLDB Endowment
Cassandra: a decentralized structured storage system
ACM SIGOPS Operating Systems Review
JAWS: Job-Aware Workload Scheduling for the Exploration of Turbulence Simulations
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
MRShare: sharing across multiple queries in MapReduce
Proceedings of the VLDB Endowment
Nova: continuous Pig/Hadoop workflows
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Meeting service level objectives of Pig programs
Proceedings of the 2nd International Workshop on Cloud Computing Platforms
Optimizing Completion Time and Resource Provisioning of Pig Programs
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Stubby: a transformation-based optimizer for MapReduce workflows
Proceedings of the VLDB Endowment
Automated profiling and resource management of pig programs for meeting service level objectives
Proceedings of the 9th international conference on Autonomic computing
On the optimization of schedules for MapReduce workloads in the presence of shared scans
The VLDB Journal — The International Journal on Very Large Data Bases
Modeling I/O interference for data intensive distributed applications
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Distributed data management using MapReduce
ACM Computing Surveys (CSUR)
Performance Modeling and Optimization of Deadline-Driven Pig Programs
ACM Transactions on Autonomous and Adaptive Systems (TAAS)
Hi-index | 0.00 |
We present CoScan, a scheduling framework that eliminates redundant processing in workflows that scan large batches of data in a map-reduce computing environment. CoScan merges Pig programs from multiple users at runtime to reduce I/O contention while adhering to soft deadline requirements in scheduling. This includes support for join workflows that operate on multiple data sources. Our solution maps well to workflows at many Internet companies which reuse data from a common set of inputs. Experiments on the PigMix data analytics benchmark exhibit orders of magnitude reduction in resource contention with minimal impact on latency.