C4.5: programs for machine learning
C4.5: programs for machine learning
An introduction to Kolmogorov complexity and its applications
An introduction to Kolmogorov complexity and its applications
Fast discovery of association rules
Advances in knowledge discovery and data mining
Efficiently mining long patterns from databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
KDD-Cup 2000 organizers' report: peeling the onion
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Levelwise Search and Borders of Theories in KnowledgeDiscovery
Data Mining and Knowledge Discovery
A Study on the Performance of Large Bayes Classifier
ECML '00 Proceedings of the 11th European Conference on Machine Learning
ECML '93 Proceedings of the European Conference on Machine Learning
SLIQ: A Fast Scalable Classifier for Data Mining
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Discovering Frequent Closed Itemsets for Association Rules
ICDT '99 Proceedings of the 7th International Conference on Database Theory
Mining All Non-derivable Frequent Itemsets
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Mining Surprising Patterns Using Temporal Description Length
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Information-Based Classification by Aggregating Emerging Patterns
IDEAL '00 Proceedings of the Second International Conference on Intelligent Data Engineering and Automated Learning, Data Mining, Financial Engineering, and Intelligent Agents
Pattern Detection and Discovery
Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery
Fully automatic cross-associations
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Towards parameter-free data mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics)
Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics)
Summarizing itemset patterns: a profile-based approach
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining compressed frequent-pattern sets
VLDB '05 Proceedings of the 31st international conference on Very large data bases
On efficiently summarizing categorical databases
Knowledge and Information Systems
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Maximally informative k-itemsets and their efficient discovery
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Summarizing itemset patterns using probabilistic models
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
The Minimum Description Length Principle (Adaptive Computation and Machine Learning)
The Minimum Description Length Principle (Adaptive Computation and Machine Learning)
Compression-based data mining of sequential data
Data Mining and Knowledge Discovery
Reducing the Frequent Pattern Set
ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
On data mining, compression, and Kolmogorov complexity
Data Mining and Knowledge Discovery
Finding low-entropy sets and trees from binary data
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
GraphScope: parameter-free mining of large time-evolving graphs
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Summarization – compressing data into an informative representation
Knowledge and Information Systems
Assessing data mining results via swap randomization
ACM Transactions on Knowledge Discovery from Data (TKDD)
Succinct summarization of transactional databases: an overlapped hyperrectangle scheme
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
StreamKrimp: Detecting Change in Data Streams
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
The Chosen Few: On Identifying Valuable Patterns
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Preserving Privacy through Data Generation
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Filling in the Blanks - Krimp Minimisation for Missing Data
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Finding Good Itemsets by Packing Data
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Characteristic relational patterns
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining and Knowledge Discovery
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Compression picks item sets that matter
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Proceedings of the 2004 international conference on Local Pattern Detection
LPD'04 Proceedings of the 2004 international conference on Local Pattern Detection
Model order selection for boolean matrix factorization
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Comparing apples and oranges: measuring differences between data mining results
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Non-redundant subgroup discovery in large and complex data
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
"Tell me more": finding related items from user provided feedback
DS'11 Proceedings of the 14th international conference on Discovery science
Towards an automatic construction of Contextual Attribute-Value Taxonomies
Proceedings of the 27th Annual ACM Symposium on Applied Computing
A constraint language for declarative pattern discovery
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Linear space direct pattern sampling using coupling from the past
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
The long and the short of it: summarising event sequences with serial episodes
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Summarizing data succinctly with the most informative itemsets
ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on the Best of SIGKDD 2011
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Incorporating occupancy into frequent pattern mining for high quality pattern recommendation
Proceedings of the 21st ACM international conference on Information and knowledge management
Fast and reliable anomaly detection in categorical data
Proceedings of the 21st ACM international conference on Information and knowledge management
Discovering descriptive tile trees: by mining optimal geometric subtiles
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Summarizing categorical data by clustering attributes
Data Mining and Knowledge Discovery
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Zips: mining compressing sequential patterns in streams
Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics
Randomly sampling maximal itemsets
Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics
Formal and computational properties of the confidence boost of association rules
ACM Transactions on Knowledge Discovery from Data (TKDD)
A statistical significance testing approach to mining the most informative set of patterns
Data Mining and Knowledge Discovery
Data summarization for network traffic monitoring
Journal of Network and Computer Applications
Hi-index | 0.01 |
One of the major problems in pattern mining is the explosion of the number of results. Tight constraints reveal only common knowledge, while loose constraints lead to an explosion in the number of returned patterns. This is caused by large groups of patterns essentially describing the same set of transactions. In this paper we approach this problem using the MDL principle: the best set of patterns is that set that compresses the database best. For this task we introduce the Krimp algorithm. Experimental evaluation shows that typically only hundreds of itemsets are returned; a dramatic reduction, up to seven orders of magnitude, in the number of frequent item sets. These selections, called code tables, are of high quality. This is shown with compression ratios, swap-randomisation, and the accuracies of the code table-based Krimp classifier, all obtained on a wide range of datasets. Further, we extensively evaluate the heuristic choices made in the design of the algorithm.