A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Progressive approximate aggregate queries with a multi-resolution tree structure
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
Efficient OLAP Operations in Spatial Data Warehouses
SSTD '01 Proceedings of the 7th International Symposium on Advances in Spatial and Temporal Databases
Efficient set joins on similarity predicates
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Information integration in the enterprise
Communications of the ACM - Enterprise information integration: and other tools for merging data
Ed-Join: an efficient algorithm for similarity joins with edit distance constraints
Proceedings of the VLDB Endowment
Fast Indexes and Algorithms for Set Similarity Selection Queries
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Efficient parallel set-similarity joins using MapReduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Identifying, attributing and describing spatial bursts
Proceedings of the VLDB Endowment
ASTERIX: towards a scalable, semistructured data platform for evolving-world models
Distributed and Parallel Databases
Bistro data feed management system
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Complex event pattern detection over streams with interval-based temporal semantics
Proceedings of the 5th ACM international conference on Distributed event-based system
Answering approximate string queries on large data sets using external memory
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Hyracks: A flexible and extensible foundation for data-intensive computing
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Active Complex Event Processing infrastructure: Monitoring and reacting to event streams
ICDEW '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering Workshops
Efficient processing of set-similarity joins on large clusters
Efficient processing of set-similarity joins on large clusters
Big data platforms: What's next?
XRDS: Crossroads, The ACM Magazine for Students - Big Data
Issues in big data testing and benchmarking
Proceedings of the Sixth International Workshop on Testing Database Systems
PonIC: using stratosphere to speed up pig analytics
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Hi-index | 0.00 |
A growing wealth of digital information is being generated on a daily basis in social networks, blogs, online communities, etc. Organizations and researchers in a wide variety of domains recognize that there is tremendous value and insight to be gained by warehousing this emerging data and making it available for querying, analysis, and other purposes. This new breed of "Big Data" applications poses challenging requirements against data management platforms in terms of scalability, flexibility, manageability, and analysis capabilities. At UC Irvine, we are building a next-generation database system, called ASTERIX, in response to these trends. We present ongoing work that approaches the following questions: How does data get into the system? What primitives should we provide to better cope with dirty/noisy data? How can we support efficient data analysis on spatial data? Using real examples, we show the capabilities of ASTERIX for ingesting data via feeds, supporting set-similarity predicates for fuzzy matching, and answering spatial aggregation queries.