Mosquito: another one bites the data upload stream

Authors:
Stefan Richter;Jens Dittrich;Stefan Schuh;Tobias Frey
Affiliations:
Information Systems Group, Saarland University;Information Systems Group, Saarland University;Information Systems Group, Saarland University;Information Systems Group, Saarland University
Venue:
Proceedings of the VLDB Endowment
Year:
2013

Citing 9
Cited 0

Cooperative scans: dynamic bandwidth sharing in a DBMS

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
MRShare: sharing across multiple queries in MapReduce

Proceedings of the VLDB Endowment
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)

Proceedings of the VLDB Endowment
Merging what's cracked, cracking what's merged: adaptive indexing in main-memory column-stores

Proceedings of the VLDB Endowment
Trojan data layouts: right shoes for a running elephant

Proceedings of the 2nd ACM Symposium on Cloud Computing
Only aggressive elephants are fast elephants

Proceedings of the VLDB Endowment
NoDB in action: adaptive query processing on raw data

Proceedings of the VLDB Endowment
Invisible loading: access-driven data transfer from raw files into database systems

Proceedings of the 16th International Conference on Extending Database Technology
CARTILAGE: adding flexibility to the Hadoop skeleton

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mosquito is a lightweight and adaptive physical design framework for Hadoop. Mosquito connects to existing data pipelines in Hadoop MapReduce and/or HDFS, observes the data, and creates better physical designs, i.e. indexes, as a byproduct. Our approach is minimally invasive, yet it allows users and developers to easily improve the runtime of Hadoop. We present three important use cases: first, how to create indexes as a byproduct of data uploads into HDFS; second, how to create indexes as a byproduct of map tasks; and third, how to execute map tasks as a byproduct of HDFS data uploads. These use cases may even be combined.