On data mining, compression, and Kolmogorov complexity

Authors:
Christos Faloutsos;Vasileios Megalooikonomou
Affiliations:
School of Computer Science, Carnegie Mellon University, Pittsburgh, USA 15213-3891;Department of Computer and Information Sciences, Temple University, Philadelphia, USA 19122
Venue:
Data Mining and Knowledge Discovery
Year:
2007

Citing 0
Cited 12

Succinct summarization of transactional databases: an overlapped hyperrectangle scheme

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
Identifying the components

Data Mining and Knowledge Discovery
Mining Databases to Mine Queries Faster

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Krimp: mining itemsets that compress

Data Mining and Knowledge Discovery
Summarizing transactional databases with overlapped hyperrectangles

Data Mining and Knowledge Discovery
Model order selection for boolean matrix factorization

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
An information theoretic framework for data mining

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
"Tell me more": finding related items from user provided feedback

DS'11 Proceedings of the 14th international conference on Discovery science
Towards information-theoretic visualization evaluation measure: a practical example for Bertin's matrices

Proceedings of the 3rd BELIV'10 Workshop: BEyond time and errors: novel evaLuation methods for Information Visualization
Time-series data mining

ACM Computing Surveys (CSUR)
A survey on enhanced subspace clustering

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Will we ever have a theory of data mining analogous to the relational algebra in databases? Why do we have so many clearly different clustering algorithms? Could data mining be automated? We show that the answer to all these questions is negative, because data mining is closely related to compression and Kolmogorov complexity; and the latter is undecidable. Therefore, data mining will always be an art, where our goal will be to find better models (patterns) that fit our datasets as best as possible.