Inferring decision trees using the minimum description length principle
Information and Computation
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
An introduction to Kolmogorov complexity and its applications (2nd ed.)
An introduction to Kolmogorov complexity and its applications (2nd ed.)
On the entropy of DNA: algorithms and measurements based on memory and rapid convergence
Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
A compression algorithm for DNA sequences and its applications in genome comparison
RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Deformable Markov model templates for time-series pattern matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining the stock market (extended abstract): which measure is best?
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Magical thinking in data mining: lessons from CoIL challenge 2000
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
On Comparing Classifiers: Pitfalls toAvoid and a Recommended Approach
Data Mining and Knowledge Discovery
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
A Process-Oriented Heuristic for Model Selection
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Distance Measures for Effective Clustering of ARIMA Time-Series
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
On the need for time series data mining benchmarks: a survey and empirical demonstration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Supporting Content-Based Searches on Time Series via Approximation
SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
DNA Sequence Classification Using Compression-Based Induction
DNA Sequence Classification Using Compression-Based Induction
A symbolic representation of time series, with implications for streaming algorithms
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Indexing multi-dimensional time-series with support for multiple distance measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Online novelty detection on temporal sequences
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Substring compression problems
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Streaming pattern discovery in multiple time-series
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Modeling Multiple Time Series for Anomaly Detection
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Parameter-Free Spatial Data Mining Using MDL
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Partial Elastic Matching of Time Series
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Structural Periodic Measures for Time-Series Data
Data Mining and Knowledge Discovery
A Bit Level Representation for Time Series Data Mining with Shape Based Similarity
Data Mining and Knowledge Discovery
SMArTIC: towards building an accurate, robust and scalable specification miner
Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering
An efficient and accurate method for evaluating time series similarity
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Trajectory clustering: a partition-and-group framework
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Spam Filtering Using Statistical Data Compression Models
The Journal of Machine Learning Research
The Google Similarity Distance
IEEE Transactions on Knowledge and Data Engineering
Towards automated record linkage
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
An elastic partial shape matching technique
Pattern Recognition
Detecting anomalous records in categorical datasets
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
GraphScope: parameter-free mining of large time-evolving graphs
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Information distance from a question to an answer
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Experiencing SAX: a novel symbolic representation of time series
Data Mining and Knowledge Discovery
Music genre classification using MIDI and audio features
EURASIP Journal on Applied Signal Processing
Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data
IEEE Transactions on Knowledge and Data Engineering
Artificial Intelligence Review
ACM SIGKDD Explorations Newsletter
Dictionary based color image retrieval
Journal of Visual Communication and Image Representation
Catching the Drift: Using Feature-Free Case-Based Reasoning for Spam Filtering
ICCBR '07 Proceedings of the 7th international conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
Towards Data Mining Without Information on Knowledge Structure
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
CommTracker: A Core-Based Algorithm of Tracking Community Evolution
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Hierarchical, Parameter-Free Community Discovery
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Ensuring Collective Availability in Volatile Resource Pools Via Forecasting
DSOM '08 Proceedings of the 19th IFIP/IEEE international workshop on Distributed Systems: Operations and Management: Managing Large-Scale Service Deployment
Efficiently finding unusual shapes in large image databases
Data Mining and Knowledge Discovery
Information shared by many objects
Proceedings of the 17th ACM conference on Information and knowledge management
ACM Transactions on Information and System Security (TISSEC)
An efficient time series data mining technique
ICCOMP'08 Proceedings of the 12th WSEAS international conference on Computers
A Novel Fractal Representation for Dimensionality Reduction of Large Time Series Data
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
New information distance measure and its application in question answering system
Journal of Computer Science and Technology
ACM Computing Surveys (CSUR)
Efficient discovery of unusual patterns in time series
New Generation Computing
CoCo: coding cost for parameter-free outlier detection
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Finding Structural Similarity in Time Series Data Using Bag-of-Patterns Representation
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Compression-Based Measures for Mining Interesting Rules
IEA/AIE '09 Proceedings of the 22nd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: Next-Generation Applied Intelligence
Forensic Authorship Attribution Using Compression Distances to Prototypes
IWCF '09 Proceedings of the 3rd International Workshop on Computational Forensics
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Improving activity discovery with automatic neighborhood estimation
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Inferring useful heuristics from the dynamics of iterative relational classifiers
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Efficient pattern detection in extremely resource-constrained devices
SECON'09 Proceedings of the 6th Annual IEEE communications society conference on Sensor, Mesh and Ad Hoc Communications and Networks
Discretization of Time Series Dataset with a Genetic Search
MICAI '09 Proceedings of the 8th Mexican International Conference on Artificial Intelligence
Novelty detection in patient histories: experiments with measures based on text compression
IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
Escalation: complex event detection in wireless sensor networks
EuroSSC'07 Proceedings of the 2nd European conference on Smart sensing and context
Similarity search in multimedia time series data using amplitude-level features
MMM'08 Proceedings of the 14th international conference on Advances in multimedia modeling
Discovering Knowledge-Sharing Communities in Question-Answering Forums
ACM Transactions on Knowledge Discovery from Data (TKDD)
Measuring the non-compositionality of multiword expressions
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Discrete wavelet transform-based time series analysis and mining
ACM Computing Surveys (CSUR)
Trajectory based behavior analysis for user verification
IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
A Fast Quartet tree heuristic for hierarchical clustering
Pattern Recognition
A review on time series data mining
Engineering Applications of Artificial Intelligence
Complex Event Detection in Extremely Resource-Constrained Wireless Sensor Networks
Mobile Networks and Applications
Krimp: mining itemsets that compress
Data Mining and Knowledge Discovery
A new multiword expression metric and its applications
Journal of Computer Science and Technology - Special issue on natural language processing
SBAD: sequence based attack detection via sequence comparison
PSDML'10 Proceedings of the international ECML/PKDD conference on Privacy and security issues in data mining and machine learning
Semi-supervised parameter-free divisive hierarchical clustering of categorical data
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Model order selection for boolean matrix factorization
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
A compression-based dissimilarity measure for multi-task clustering
ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
A practical approach for clustering transaction data
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Compact coding for hyperplane classifiers in heterogeneous environment
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Information distance and its extensions
DS'11 Proceedings of the 14th international conference on Discovery science
Image classification via LZ78 based string kernel: a comparative study
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Recent advances in mining time series data
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
A multi-metric index for euclidean and periodic matching
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Elastic partial matching of time series
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Indexing of sequences of sets for efficient exact and similar subsequence matching
ISCIS'05 Proceedings of the 20th international conference on Computer and Information Sciences
A fast compression-based similarity measure with applications to content-based image retrieval
Journal of Visual Communication and Image Representation
Proceedings of the 3rd BELIV'10 Workshop: BEyond time and errors: novel evaLuation methods for Information Visualization
Similarity of objects and the meaning of words
TAMC'06 Proceedings of the Third international conference on Theory and Applications of Models of Computation
AISS: an index for non-timestamped set subsequence queries
DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Finding time series discords based on haar transform
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Information distance and its applications
CIAA'06 Proceedings of the 11th international conference on Implementation and Application of Automata
DHCC: Divisive hierarchical clustering of categorical data
Data Mining and Knowledge Discovery
Modularity-driven clustering of dynamic graphs
SEA'10 Proceedings of the 9th international conference on Experimental Algorithms
Recent advances in mining time series data
ECML'05 Proceedings of the 16th European conference on Machine Learning
A novel bit level time series representation with implication of similarity search and clustering
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Clustering the normalized compression distance for influenza virus data
Algorithms and Applications
Times series discretization using evolutionary programming
MICAI'11 Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II
The Normalised Compression Distance as a file fragment classifier
Digital Investigation: The International Journal of Digital Forensics & Incident Response
Trajectory analysis for user verification and recognition
Knowledge-Based Systems
ACM Computing Surveys (CSUR)
Rotation-invariant similarity in time series using bag-of-patterns representation
Journal of Intelligent Information Systems
FOCUS: an index for continuous subsequence pattern queries
ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Parameter-less co-clustering for star-structured heterogeneous data
Data Mining and Knowledge Discovery
A survey on enhanced subspace clustering
Data Mining and Knowledge Discovery
Information distance between what I said and what it heard
Communications of the ACM
Information Technology and Management
Legal documents categorization by compression
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Law
Dictionary-based color image retrieval using multiset theory
Journal of Visual Communication and Image Representation
Review: A review of novelty detection
Signal Processing
Hi-index | 0.02 |
Most data mining algorithms require the setting of many input parameters. Two main dangers of working with parameter-laden algorithms are the following. First, incorrect settings may cause an algorithm to fail in finding the true patterns. Second, a perhaps more insidious problem is that the algorithm may report spurious patterns that do not really exist, or greatly overestimate the significance of the reported patterns. This is especially likely when the user fails to understand the role of parameters in the data mining process.Data mining algorithms should have as few parameters as possible, ideally none. A parameter-free algorithm would limit our ability to impose our prejudices, expectations, and presumptions on the problem at hand, and would let the data itself speak to us. In this work, we show that recent results in bioinformatics and computational theory hold great promise for a parameter-free data-mining paradigm. The results are motivated by observations in Kolmogorov complexity theory. However, as a practical matter, they can be implemented using any off-the-shelf compression algorithm with the addition of just a dozen or so lines of code. We will show that this approach is competitive or superior to the state-of-the-art approaches in anomaly/interestingness detection, classification, and clustering with empirical tests on time series/DNA/text/video datasets.