Clustering heterogeneous data using clustering by compression

  • Authors:
  • Alexandra Cernian;Dorin Carstoiu

  • Affiliations:
  • Automatic Control and Computer Science Faculty, Politehnica University of Bucharest, Romania;Automatic Control and Computer Science Faculty, Politehnica University of Bucharest, Romania

  • Venue:
  • ICCOMP'09 Proceedings of the WSEAES 13th international conference on Computers
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Nowadays, we have to deal with a large quantity of unstructured data, produced by a number of sources. The application of clustering on the World Wide Web is essential to getting structured information in response to user queries. In this paper, we intend to test the results of a new clustering technique - clustering by compression - when applied to heterogeneous sets of data. The clustering by compression procedure is based on a parameter-free, universal, similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pair-wise concatenation).