Clustering and semantics preservation in cultural heritage information spaces

  • Authors:
  • Javier Pereira;Felipe Schmidt;Pedro Contreras;Fionn Murtagh;Hernan Astudillo

  • Affiliations:
  • Universidad Diego Portales, Santiago, Chile;Universidad Diego Portales, Santiago, Chile;University of London, Egham Hill, Surrey, England;University of London, Egham Hill, Surrey, England;Universidad Técnica Federico, Valparaíso, Chile

  • Venue:
  • RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we analyze the preservation of original semantic similarity among objects when dimensional reduction is applied on the original data source and a further clustering process is performed on dimensionally reduced data. An experiment is designed to test Baire, or longest common prefix ultrametric, and K-Means when prior random projection is applied. A data matrix extracted from a cultural heritage database has been prepared for the experiment. Given that the random projection produces a vector with components ranging on the interval [0, 1], clusters are obtained at different precision levels. Next, the mean semantic similarity of clusters is calculated using a modified version of the Jaccard index. Our findings show that semantics is difficult to preserve by these methods. However, a Student's hypothesis test on mean similarity indicates that Baire clusters objects are semantically better than K-Means when we increase the digit precision, but paying an increasing cost for orphan clustered objects. Despite this cost, it is argued that the ultrametric technique provides an efficient process to detect semantic homogeneity on the original data space.