What is the Dimension of Your Binary Data?

Authors:
Nikolaj Tatti;Taneli Mielikainen;Aristides Gionis;Heikki Mannila
Affiliations:
University of Helsinki and Helsinki University of Technology, Finland;University of Helsinki, Finland;University of Helsinki and Helsinki University of Technology, Finland;University of Helsinki and Helsinki University of Technology, Finland
Venue:
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Year:
2006

Citing 0
Cited 12

2008 Special Issue: An axiomatic approach to intrinsic dimension of a dataset

Neural Networks
Mining non-redundant high order correlations in binary data

Proceedings of the VLDB Endowment
Capturing truthiness: mining truth tables in binary datasets

Proceedings of the 2009 ACM symposium on Applied Computing
Factor Analysis of Incidence Data via Novel Decomposition of Matrices

ICFCA '09 Proceedings of the 7th International Conference on Formal Concept Analysis
On Social Networks Reduction

ISMIS '09 Proceedings of the 18th International Symposium on Foundations of Intelligent Systems
Analyzing Social Networks Using FCA: Complexity Aspects

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Discovery of optimal factors in binary data via a novel method of matrix decomposition

Journal of Computer and System Sciences
On local intrinsic dimension estimation and its applications

IEEE Transactions on Signal Processing
Factorizing three-way binary data with triadic formal concepts

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part I
Optimal decompositions of matrices with grades into binary and graded matrices

Annals of Mathematics and Artificial Intelligence
Model order selection for boolean matrix factorization

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Factorizing three-way ordinal data using triadic formal concepts

FQAS'11 Proceedings of the 9th international conference on Flexible Query Answering Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many 0/1 datasets have a very large number of variables; however, they are sparse and the dependency structure of the variables is simpler than the number of variables would suggest. Defining the effective dimensionality of such a dataset is a nontrivial problem. We consider the problem of defining a robust measure of dimension for 0/1 datasets, and show that the basic idea of fractal dimension can be adapted for binary data. However, as such the fractal dimension is difficult to interpret. Hence we introduce the concept of normalized fractal dimension. For a dataset D, its normalized fractal dimension counts the number of independent columns needed to achieve the unnormalized fractal dimension of D. The normalized fractal dimension measures the degree of dependency structure of the data. We study the properties of the normalized fractal dimension and discuss its computation. We give empirical results on the normalized fractal dimension, comparing it against PCA.