ASTERIX: scalable warehouse-style web data integration

  • Authors:
  • Sattam Alsubaiee;Alexander Behm;Raman Grover;Rares Vernica;Vinayak Borkar;Michael J. Carey;Chen Li

  • Affiliations:
  • University of California, Irvine;University of California, Irvine;University of California, Irvine;University of California, Irvine;University of California, Irvine;University of California, Irvine;University of California, Irvine

  • Venue:
  • Proceedings of the Ninth International Workshop on Information Integration on the Web
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

A growing wealth of digital information is being generated on a daily basis in social networks, blogs, online communities, etc. Organizations and researchers in a wide variety of domains recognize that there is tremendous value and insight to be gained by warehousing this emerging data and making it available for querying, analysis, and other purposes. This new breed of "Big Data" applications poses challenging requirements against data management platforms in terms of scalability, flexibility, manageability, and analysis capabilities. At UC Irvine, we are building a next-generation database system, called ASTERIX, in response to these trends. We present ongoing work that approaches the following questions: How does data get into the system? What primitives should we provide to better cope with dirty/noisy data? How can we support efficient data analysis on spatial data? Using real examples, we show the capabilities of ASTERIX for ingesting data via feeds, supporting set-similarity predicates for fuzzy matching, and answering spatial aggregation queries.