Performance Considerations of Data Acquisition in Hadoop System

Authors:
Baodong Jia;Tomasz Wiktor Wlodarczyk;Chunming Rong
Affiliations:
-;-;-
Venue:
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Year:
2010

Citing 0
Cited 2

An intelligent cloud system adopting file pre-fetching

ADCONS'11 Proceedings of the 2011 international conference on Advanced Computing, Networking and Security
Input data organization for batch processing in time window based computations

Proceedings of the 28th Annual ACM Symposium on Applied Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data have become more and more important these years, especially for big companies, and it is of great benefit to mine useful information in these data. Oil & Gas industry has to deal with vast amounts of data, both in real-time and historical context. As the amount of data is significant, it is usually infeasible or very time consuming to actually process the data. In our project we investigate usage of Hadoop to solve this problem. In order to perform Hadoop jobs, data must first exist in the Hadoop file system, which creates the problem of data acquisition. In this paper, two solutions are investigates, performance comparison is performed and solution based on Chukwa is demonstrated to be more efficient than a na茂ve implementation in particular for bigger file sizes.