ASTERIX: towards a scalable, semistructured data platform for evolving-world models

Authors:
Alexander Behm;Vinayak R. Borkar;Michael J. Carey;Raman Grover;Chen Li;Nicola Onose;Rares Vernica;Alin Deutsch;Yannis Papakonstantinou;Vassilis J. Tsotras
Affiliations:
University of California, Irvine, USA;University of California, Irvine, USA;University of California, Irvine, USA;University of California, Irvine, USA;University of California, Irvine, USA;University of California, Irvine, USA;University of California, Irvine, USA;University of California, San Diego, USA;University of California, San Diego, USA;University of California, Riverside, USA
Venue:
Distributed and Parallel Databases
Year:
2011

Citing 39
Cited 19

The performance of multiversion concurrency control algorithms

ACM Transactions on Computer Systems (TOCS)
The HiPAC project: combining active databases and timing constraints

ACM SIGMOD Record - Special Issue on Real-Time Database Systems
Nested relations and complex objects in databases

Nested relations and complex objects in databases
Parallel database systems: the future of high performance database systems

Communications of the ACM
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
LORE: a Lightweight Object REpository for semistructured data

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A taxonomy of time databases

SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
Data on the Web: from relations to semistructured data and XML

Data on the Web: from relations to semistructured data and XML
Optimization of parallel query execution plans in XPRS

PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Database Management Systems

Database Management Systems
The Gamma Database Machine Project

IEEE Transactions on Knowledge and Data Engineering
Parallel Query Scheduling and Optimization with Time- and Space-Shared Resources

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Scalable Trigger Processing

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Path sharing and predicate evaluation for high-performance XML filtering

ACM Transactions on Database Systems (TODS)
Interpreting the data: Parallel analysis with Sawzall

Scientific Programming - Dynamic Grids and Worldwide Computing
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Bigtable: A Distributed Storage System for Structured Data

ACM Transactions on Computer Systems (TOCS)
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Ed-Join: an efficient algorithm for similarity joins with edit distance constraints

Proceedings of the VLDB Endowment
SCOPE: easy and efficient parallel processing of massive data sets

Proceedings of the VLDB Endowment
PNUTS: Yahoo!'s hosted data serving platform

Proceedings of the VLDB Endowment
The Claremont report on database research

Communications of the ACM - One Laptop Per Child: Vision vs. Reality
Efficient Merging and Filtering Algorithms for Approximate String Searches

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
XML processing in DHT networks

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Space-Constrained Gram-Based Indexing for Efficient Approximate String Search

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Efficient top-k algorithms for fuzzy search in string collections

Proceedings of the First International Workshop on Keyword Search on Structured Data
MapReduce and parallel DBMSs: friends or foes?

Communications of the ACM - Amir Pnueli: Ahead of His Time
MapReduce: a flexible data processing tool

Communications of the ACM - Amir Pnueli: Ahead of His Time
FlumeJava: easy, efficient data-parallel pipelines

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Nephele/PACTs: a programming model and execution framework for web-scale analytical processing

Proceedings of the 1st ACM symposium on Cloud computing
Efficient parallel set-similarity joins using MapReduce

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Dremel: interactive analysis of web-scale datasets

Proceedings of the VLDB Endowment
Hyracks: A flexible and extensible foundation for data-intensive computing

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Expressiveness and performance of full-text search languages

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology

ChuQL: processing XML with XQuery using Hadoop

Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research
Clydesdale: structured data processing on MapReduce

Proceedings of the 15th International Conference on Extending Database Technology
An optimization framework for map-reduce queries

Proceedings of the 15th International Conference on Extending Database Technology
Big data platforms: What's next?

XRDS: Crossroads, The ACM Magazine for Students - Big Data
ASTERIX: scalable warehouse-style web data integration

Proceedings of the Ninth International Workshop on Information Integration on the Web
Opening the black boxes in data flow optimization

Proceedings of the VLDB Endowment
Spinning fast iterative data flows

Proceedings of the VLDB Endowment
ASTERIX: an open source system for "Big Data" management and analysis (demo)

Proceedings of the VLDB Endowment
Myriad: parallel data generation on shared-nothing architectures

Proceedings of the 1st Workshop on Architectures and Systems for Big Data
Static and dynamic semantics of NoSQL languages

POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Map/reduce on EMF models

Proceedings of the 1st International Workshop on Model-Driven Engineering for High Performance and CLoud computing
Shark: SQL and rich analytics at scale

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
A bloat-aware design for big data applications

Proceedings of the 2013 international symposium on memory management
Big data analytics with small footprint: squaring the cloud

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Mammoth: autonomic data processing framework for scientific state-transition applications

Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
Can we analyze big data inside a DBMS?

Proceedings of the sixteenth international workshop on Data warehousing and OLAP
Revisiting aggregation techniques for big data

Proceedings of the sixteenth international workshop on Data warehousing and OLAP
The family of mapreduce and large-scale data processing systems

ACM Computing Surveys (CSUR)
Trends and outlook for the massive-scale analytics stack

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

ASTERIX is a new data-intensive storage and computing platform project spanning UC Irvine, UC Riverside, and UC San Diego. In this paper we provide an overview of the ASTERIX project, starting with its main goal--the storage and analysis of data pertaining to evolving-world models. We describe the requirements and associated challenges, and explain how the project is addressing them. We provide a technical overview of ASTERIX, covering its architecture, its user model for data and queries, and its approach to scalable query processing and data management. ASTERIX utilizes a new scalable runtime computational platform called Hyracks that is also discussed at an overview level; we have recently made Hyracks available in open source for use by other interested parties. We also relate our work on ASTERIX to the current state of the art and describe the research challenges that we are currently tackling as well as those that lie ahead.