Inferring decision trees using the minimum description length principle
Information and Computation
Graph clustering and model learning by data compression
Proceedings of the seventh international conference (1990) on Machine learning
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
An introduction to Kolmogorov complexity and its applications (2nd ed.)
An introduction to Kolmogorov complexity and its applications (2nd ed.)
On the entropy of DNA: algorithms and measurements based on memory and rapid convergence
Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
Deformable Markov model templates for time-series pattern matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining the stock market (extended abstract): which measure is best?
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Magical thinking in data mining: lessons from CoIL challenge 2000
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
On Comparing Classifiers: Pitfalls toAvoid and a Recommended Approach
Data Mining and Knowledge Discovery
IEEE Intelligent Systems
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
A Hierarchical Model for Clustering and Categorising Documents
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
A Process-Oriented Heuristic for Model Selection
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Distance Measures for Effective Clustering of ARIMA Time-Series
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Implementing Lazy Database Updates for an Object Database System
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
On the need for time series data mining benchmarks: a survey and empirical demonstration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Text Categorization Using Compression Models
DCC '00 Proceedings of the Conference on Data Compression
SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
DNA Sequence Classification Using Compression-Based Induction
DNA Sequence Classification Using Compression-Based Induction
A symbolic representation of time series, with implications for streaming algorithms
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Indexing multi-dimensional time-series with support for multiple distance measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Online novelty detection on temporal sequences
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A compression-based algorithm for Chinese word segmentation
Computational Linguistics
Fully automatic cross-associations
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Parameter-Free Spatial Data Mining Using MDL
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Compression and Machine Learning: A New Perspective on Feature Space Vectors
DCC '06 Proceedings of the Data Compression Conference
Preprocessing techniques for context recognition from accelerometer data
Personal and Ubiquitous Computing
Visualization of text streams: a survey
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II
A review on time series data mining
Engineering Applications of Artificial Intelligence
Nonapproximability of the normalized information distance
Journal of Computer and System Sciences
Krimp: mining itemsets that compress
Data Mining and Knowledge Discovery
The minimum code length for clustering using the gray code
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Stock market co-movement assessment using a three-phase clustering method
Expert Systems with Applications: An International Journal
CID: an efficient complexity-invariant distance for time series
Data Mining and Knowledge Discovery
Exploring programmable self-assembly in non-DNA based molecular computing
Natural Computing: an international journal
Hi-index | 0.00 |
The vast majority of data mining algorithms require the setting of many input parameters. The dangers of working with parameter-laden algorithms are twofold. First, incorrect settings may cause an algorithm to fail in finding the true patterns. Second, a perhaps more insidious problem is that the algorithm may report spurious patterns that do not really exist, or greatly overestimate the significance of the reported patterns. This is especially likely when the user fails to understand the role of parameters in the data mining process. Data mining algorithms should have as few parameters as possible. A parameter-light algorithm would limit our ability to impose our prejudices, expectations, and presumptions on the problem at hand, and would let the data itself speak to us. In this work, we show that recent results in bioinformatics, learning, and computational theory hold great promise for a parameter-light data-mining paradigm. The results are strongly connected to Kolmogorov complexity theory. However, as a practical matter, they can be implemented using any off-the-shelf compression algorithm with the addition of just a dozen lines of code. We will show that this approach is competitive or superior to many of the state-of-the-art approaches in anomaly/interestingness detection, classification, and clustering with empirical tests on time series/DNA/text/XML/video datasets. As a further evidence of the advantages of our method, we will demonstrate its effectiveness to solve a real world classification problem in recommending printing services and products.