The PanQ tool and EMF SQL for complex data management
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
The MD-join: An Operator for Complex OLAP
Proceedings of the 17th International Conference on Data Engineering
Querying Multiple Features of Groups in Relational Databases
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Evaluation of Ad Hoc OLAP: In-Place Computation
SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
Using grouping variables to express complex decision support queries
Data & Knowledge Engineering
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
MapReduce: a flexible data processing tool
Communications of the ACM - Amir Pnueli: Ahead of His Time
Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
Integrating hadoop and parallel DBMs
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
ASSET queries: a declarative alternative to MapReduce
ACM SIGMOD Record
MRShare: sharing across multiple queries in MapReduce
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
MapReduce is a programming paradigm for effective processing of large datasets in distributed environments, using the map and reduce functions. The map process creates (key, value) pairs, while the reduce phase aggregates same-key values. In other words, a MapReduce application defines and reduces one set of values for each key, which means that the user only knows one aspect of the key. Advanced OLAP applications however, require multiple sets to be defined and reduced for the same key, not necessarily mutually disjoint. The challenge is to extend MapReduce to support this in a syntactically simple and computationally efficient way. We propose an extension to the classic MapReduce model, called Tagged MapReduce, where data is represented as (key, value, tag) triplets. Users map triplets and reducing takes place for each key and for each tag. For example, given a set of pages, one may want to count words' occurrences per page type. The page type is represented by the tag. While the classic MapReduce can handle this class of queries, it requires effort and possibly advanced programming skills for efficient implementations. For example, should the tag form a compound object with the key or the value? Our formalism makes it simpler for the programmer to use and easier for the system to identify and apply efficient algorithms.