Random sampling for histogram construction: how much is enough?
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Practical Skew Handling in Parallel Joins
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Convex Optimization
Predictive Resource Management for Wearable Computing
Proceedings of the 1st international conference on Mobile systems, applications and services
On synopses for distinct-value estimation under multiset operations
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Measuring empirical computational complexity
Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Finding frequent items in data streams
Proceedings of the VLDB Endowment
SPEED: precise and efficient static estimation of program computational complexity
Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
BotGraph: large scale spamming botnet detection
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Skew-resistant parallel processing of feature-extracting scientific user-defined functions
Proceedings of the 1st ACM symposium on Cloud computing
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Re-optimizing data-parallel computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Optimizing data shuffling in data-parallel computation by understanding user-defined functions
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Spotting code optimizations in data-parallel pipelines through PeriSCOPE
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Optimus: a dynamic rewriting framework for data-parallel execution plans
Proceedings of the 8th ACM European Conference on Computer Systems
Mammoth: autonomic data processing framework for scientific state-transition applications
Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
Hadoop's adolescence: an analysis of Hadoop usage in scientific workloads
Proceedings of the VLDB Endowment
FENNEL: streaming graph partitioning for massive scale graphs
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
Performance of data-parallel computing (e.g., MapReduce, DryadLINQ) heavily depends on its data partitions. Solutions implemented by the current state of the art systems are far from optimal. Techniques proposed by the database community to find optimal data partitions are not directly applicable when complex user-defined functions and data models are involved. We outline our solution, which draws expertise from various fields such as programming languages and optimization, and present our preliminary results.