A measure of variance for hierarchical nominal attributes

  • Authors:
  • Josep Domingo-Ferrer;Agusti Solanas

  • Affiliations:
  • Universitat Rovira i Virgili, UNESCO Chair in Data Privacy, Department of Computer Engineering and Maths, Av. Paısos Catalans 26, E-43007 Tarragona, Catalonia, Spain;Universitat Rovira i Virgili, UNESCO Chair in Data Privacy, Department of Computer Engineering and Maths, Av. Paısos Catalans 26, E-43007 Tarragona, Catalonia, Spain

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2008

Quantified Score

Hi-index 0.07

Visualization

Abstract

The need for measuring the dispersion of nominal categorical attributes appears in several applications, like clustering or data anonymization. For a nominal attribute whose categories can be hierarchically classified, a measure of the variance of a sample drawn from that attribute is proposed which takes the attribute's hierarchy into account. The new measure is the reciprocal of ''consanguinity'': the less related the nominal categories in the sample, the higher the measured variance. For non-hierarchical nominal attributes, the proposed measure yields results consistent with previous diversity indicators. Applications of the new nominal variance measure to economic diversity measurement and data anonymization are also discussed.