A hierarchical semantic-based distance for nominal histogram comparison

  • Authors:
  • Camille Kurtz;Pierre Gançarski;Nicolas Passat;Anne Puissant

  • Affiliations:
  • Stanford University, USA and Université de Strasbourg, ICube, UMR 7357, France;Université de Strasbourg, ICube, UMR 7357, France;Université de Reims Champagne-Ardenne, CReSTIC, EA 3804, France;Université de Strasbourg, LIVE, ERL CNRS 7230, France

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a new distance called Hierarchical Semantic-Based Distance (HSBD), devoted to the comparison of nominal histograms equipped with a dissimilarity matrix providing the semantic correlations between the bins. The computation of this distance is based on a hierarchical strategy, progressively merging the considered instances (and their bins) according to their semantic proximity. For each level of this hierarchy, a standard bin-to-bin distance is computed between the corresponding pair of histograms. In order to obtain the proposed distance, these bin-to-bin distances are then fused by taking into account the semantic coherency of their associated level. From this modus operandi, the proposed distance can handle histograms which are generally compared thanks to cross-bin distances. It preserves the advantages of such cross-bin distances (namely robustness to histogram translation and histogram bin size issues), while inheriting the low computational cost of bin-to-bin distances. Validations in the context of geographical data classification emphasize the relevance and usefulness of the proposed distance.