An Efficient Cross-Match Implementation Based on Directed Join Algorithm in MapReduce

  • Authors:
  • Cuncang Mi;Qian Chen;Taoying Liu

  • Affiliations:
  • -;-;-

  • Venue:
  • UCC '11 Proceedings of the 2011 Fourth IEEE International Conference on Utility and Cloud Computing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the field of astronomy, "Cross-Match" is a common operation used to mine useful information by joining different star catalogues. Nowadays star catalogues obtained through astronomical telescopes are becoming much larger than ever before, which drives us to consider implementing Cross-Match in a distributed computing environment. Although the computer hardware is cheap now and resizable compute capacity in the cloud is also available from some web services, we conduct experiments in a restricted environment to conserve resources as much as possible. In our work, we first use Hive from Face book, but find it not as efficient as we expected when facing two big catalogues. Then we analyze the join process Hive has and carry out some optimization, however, the result is still not satisfactory. Finally, we design our own Cross-Match program which bases on the directed join algorithm in MapReduce, takes advantage of the characteristics of astronomical data, and runs on top of Hadoop. Our program has improved the performance by 86% compared with the common join in Hive when making Cross-Match between USNOA and 2MASS.