ASTERIX: scalable warehouse-style web data integration

Authors:
Sattam Alsubaiee;Alexander Behm;Raman Grover;Rares Vernica;Vinayak Borkar;Michael J. Carey;Chen Li
Affiliations:
University of California, Irvine;University of California, Irvine;University of California, Irvine;University of California, Irvine;University of California, Irvine;University of California, Irvine;University of California, Irvine
Venue:
Proceedings of the Ninth International Workshop on Information Integration on the Web
Year:
2012

Citing 18
Cited 3

A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Progressive approximate aggregate queries with a multi-resolution tree structure

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Approximate String Joins in a Database (Almost) for Free

Proceedings of the 27th International Conference on Very Large Data Bases
Efficient OLAP Operations in Spatial Data Warehouses

SSTD '01 Proceedings of the 7th International Symposium on Advances in Spatial and Temporal Databases
Efficient set joins on similarity predicates

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Information integration in the enterprise

Communications of the ACM - Enterprise information integration: and other tools for merging data
Ed-Join: an efficient algorithm for similarity joins with edit distance constraints

Proceedings of the VLDB Endowment
Fast Indexes and Algorithms for Set Similarity Selection Queries

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Efficient parallel set-similarity joins using MapReduce

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Identifying, attributing and describing spatial bursts

Proceedings of the VLDB Endowment
ASTERIX: towards a scalable, semistructured data platform for evolving-world models

Distributed and Parallel Databases
Bistro data feed management system

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Complex event pattern detection over streams with interval-based temporal semantics

Proceedings of the 5th ACM international conference on Distributed event-based system
Answering approximate string queries on large data sets using external memory

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Hyracks: A flexible and extensible foundation for data-intensive computing

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Active Complex Event Processing infrastructure: Monitoring and reacting to event streams

ICDEW '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering Workshops
Efficient processing of set-similarity joins on large clusters

Efficient processing of set-similarity joins on large clusters

Big data platforms: What's next?

XRDS: Crossroads, The ACM Magazine for Students - Big Data
Issues in big data testing and benchmarking

Proceedings of the Sixth International Workshop on Testing Database Systems
PonIC: using stratosphere to speed up pig analytics

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A growing wealth of digital information is being generated on a daily basis in social networks, blogs, online communities, etc. Organizations and researchers in a wide variety of domains recognize that there is tremendous value and insight to be gained by warehousing this emerging data and making it available for querying, analysis, and other purposes. This new breed of "Big Data" applications poses challenging requirements against data management platforms in terms of scalability, flexibility, manageability, and analysis capabilities. At UC Irvine, we are building a next-generation database system, called ASTERIX, in response to these trends. We present ongoing work that approaches the following questions: How does data get into the system? What primitives should we provide to better cope with dirty/noisy data? How can we support efficient data analysis on spatial data? Using real examples, we show the capabilities of ASTERIX for ingesting data via feeds, supporting set-similarity predicates for fuzzy matching, and answering spatial aggregation queries.