Mosquito: another one bites the data upload stream

  • Authors:
  • Stefan Richter;Jens Dittrich;Stefan Schuh;Tobias Frey

  • Affiliations:
  • Information Systems Group, Saarland University;Information Systems Group, Saarland University;Information Systems Group, Saarland University;Information Systems Group, Saarland University

  • Venue:
  • Proceedings of the VLDB Endowment
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mosquito is a lightweight and adaptive physical design framework for Hadoop. Mosquito connects to existing data pipelines in Hadoop MapReduce and/or HDFS, observes the data, and creates better physical designs, i.e. indexes, as a byproduct. Our approach is minimally invasive, yet it allows users and developers to easily improve the runtime of Hadoop. We present three important use cases: first, how to create indexes as a byproduct of data uploads into HDFS; second, how to create indexes as a byproduct of map tasks; and third, how to execute map tasks as a byproduct of HDFS data uploads. These use cases may even be combined.