Marginality: a numerical mapping for enhanced exploitation of taxonomic attributes

Authors:
Josep Domingo-Ferrer
Affiliations:
Dept. of Computer Engineering and Mathematics UNESCO Chair in Data Privacy, Universitat Rovira i Virgili, Tarragona, Catalonia, Spain
Venue:
MDAI'12 Proceedings of the 9th international conference on Modeling Decisions for Artificial Intelligence
Year:
2012

Citing 5
Cited 1

Protecting Respondents' Identities in Microdata Release

IEEE Transactions on Knowledge and Data Engineering
Practical Data-Oriented Microaggregation for Statistical Disclosure Control

IEEE Transactions on Knowledge and Data Engineering
Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation

Data Mining and Knowledge Discovery
A measure of variance for hierarchical nominal attributes

Information Sciences: an International Journal
Ontology-based semantic similarity: A new feature-based approach

Expert Systems with Applications: An International Journal

Anonymization methods for taxonomic microdata

PSD'12 Proceedings of the 2012 international conference on Privacy in Statistical Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hierarchical attributes appear in taxonomic or ontology- based data (e.g. NACE economic activities, ICD-classified diseases, animal/plant species, etc.). Such taxonomic data are often exploited as if they were flat nominal data without hierarchy, which implies losing substantial information and analytical power. We introduce marginality, a numerical mapping for taxonomic data that allows using on those data many of the algorithms and analytical techniques designed for numerical data. We show how to compute descriptive statistics like the mean, the variance and the covariance on marginality-mapped data. Also, we define a mathematical distance between records including hierarchical attributes that is based on marginality-based variances. Such a distance paves the way to re-using on taxonomic data clustering and anonymization techniques designed for numerical data.