Hyracks: A flexible and extensible foundation for data-intensive computing

Authors:
Vinayak Borkar;Michael Carey;Raman Grover;Nicola Onose;Rares Vernica
Affiliations:
Computer Science Department, University of California, Irvine, 92697, USA;Computer Science Department, University of California, Irvine, 92697, USA;Computer Science Department, University of California, Irvine, 92697, USA;Computer Science Department, University of California, Irvine, 92697, USA;Computer Science Department, University of California, Irvine, 92697, USA
Venue:
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Year:
2011

Citing 0
Cited 36

Map-reduce extensions and recursive queries

Proceedings of the 14th International Conference on Extending Database Technology
ASTERIX: towards a scalable, semistructured data platform for evolving-world models

Distributed and Parallel Databases
Cluster computing, recursion and datalog

Datalog'10 Proceedings of the First international conference on Datalog Reloaded
The HaLoop approach to large-scale iterative data analysis

The VLDB Journal — The International Journal on Very Large Data Bases
Inside "Big Data management": ogres, onions, or parfaits?

Proceedings of the 15th International Conference on Extending Database Technology
An optimization framework for map-reduce queries

Proceedings of the 15th International Conference on Extending Database Technology
Transitive closure and recursive Datalog implemented on clusters

Proceedings of the 15th International Conference on Extending Database Technology
Adaptive MapReduce using situation-aware mappers

Proceedings of the 15th International Conference on Extending Database Technology
Improving online aggregation performance for skewed data distribution

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Integrating open government data with stratosphere for more transparency

Web Semantics: Science, Services and Agents on the World Wide Web
Massively-parallel stream processing under QoS constraints with Nephele

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Big data platforms: What's next?

XRDS: Crossroads, The ACM Magazine for Students - Big Data
ASTERIX: scalable warehouse-style web data integration

Proceedings of the Ninth International Workshop on Information Integration on the Web
Early accurate results for advanced analytics on MapReduce

Proceedings of the VLDB Endowment
The seven deadly sins of cloud computing research

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Opening the black boxes in data flow optimization

Proceedings of the VLDB Endowment
Spinning fast iterative data flows

Proceedings of the VLDB Endowment
REX: recursive, delta-based data-centric computation

Proceedings of the VLDB Endowment
The MADlib analytics library: or MAD skills, the SQL

Proceedings of the VLDB Endowment
ASTERIX: an open source system for "Big Data" management and analysis (demo)

Proceedings of the VLDB Endowment
SCOPE: parallel databases meet MapReduce

The VLDB Journal — The International Journal on Very Large Data Bases
Optimizing large-scale Semi-Naïve datalog evaluation in hadoop

Datalog 2.0'12 Proceedings of the Second international conference on Datalog in Academia and Industry
Sparkler: supporting large-scale matrix factorization

Proceedings of the 16th International Conference on Extending Database Technology
Shark: SQL and rich analytics at scale

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
A bloat-aware design for big data applications

Proceedings of the 2013 international symposium on memory management
Large-scale computation not at the cost of expressiveness

HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
"All roads lead to Rome": optimistic recovery for distributed iterative data processing

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Revisiting aggregation techniques for big data

Proceedings of the sixteenth international workshop on Data warehousing and OLAP
The family of mapreduce and large-scale data processing systems

ACM Computing Surveys (CSUR)
Scalable lineage capture for debugging DISC analytics

Proceedings of the 4th annual Symposium on Cloud Computing
Pregelix: dataflow-based big graph analytics

Proceedings of the 4th annual Symposium on Cloud Computing
Continuous cloud-scale query optimization and processing

Proceedings of the VLDB Endowment
Piranha: optimizing short jobs in Hadoop

Proceedings of the VLDB Endowment
REEF: retainable evaluator execution framework

Proceedings of the VLDB Endowment
Scalable topic-specific influence analysis on microblogs

Proceedings of the 7th ACM international conference on Web search and data mining
Nephele streaming: stream processing under QoS constraints at scale

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hyracks is a new partitioned-parallel software platform designed to run data-intensive computations on large shared-nothing clusters of computers. Hyracks allows users to express a computation as a DAG of data operators and connectors. Operators operate on partitions of input data and produce partitions of output data, while connectors repartition operators' outputs to make the newly produced partitions available at the consuming operators. We describe the Hyracks end user model, for authors of dataflow jobs, and the extension model for users who wish to augment Hyracks' built-in library with new operator and/or connector types. We also describe our initial Hyracks implementation. Since Hyracks is in roughly the same space as the open source Hadoop platform, we compare Hyracks with Hadoop experimentally for several different kinds of use cases. The initial results demonstrate that Hyracks has significant promise as a next-generation platform for data-intensive applications.