Clustering with Apache Hadoop

  • Authors:
  • S. Nair;J. Mehta

  • Affiliations:
  • Shah And Anchor Kutchhi Engineering College, Chembur, Mumbai, India;Shah and Anchor Kutchhi Engineering College, Chembur, Mumbai, India

  • Venue:
  • Proceedings of the International Conference & Workshop on Emerging Trends in Technology
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The self-organizing map (SOM) is an unsupervised neural network which projects high-dimensional data onto a low-dimensional grid and visually reveals the topological order of the original data. Thus, SOM is an excellent tool in the exploratory phase of data mining. Self-organizing maps have been successfully applied to many fields, including engineering and business domains. Experimental results on census database illustrate the results of clustering. The paper proposes to improve the performance of clustering by the latest approach of cloud computing. The approach focuses on Hadoop that provides a Java-based software framework to distribute processing over a cluster of processors by providing a open source implementation of MapReduce, a powerful tool designed for the detailed analysis and transformation of very large data sets.