Frequency-based views to pattern collections

Authors:
Taneli Mielikäinen
Affiliations:
HIIT Basic Research Unit, Department of Computer Science, University of Helsinki, P.O. Box 68 (Gustaf Hällströmin katu 2b), FIN-00014, Finland
Venue:
Discrete Applied Mathematics - Special issue: Discrete mathematics & data mining II (DM & DM II)
Year:
2006

Citing 45
Cited 1

Optimal algorithms for approximate clustering

STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Fast discovery of association rules

Advances in knowledge discovery and data mining
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
A condensed representation to find frequent patterns

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Principles of data mining

Principles of data mining
Approximate clustering via core-sets

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Discovery of relational association rules

Relational Data Mining
Levelwise Search and Borders of Theories in KnowledgeDiscovery

Data Mining and Knowledge Discovery
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Constraint-Based Rule Mining in Large, Dense Databases

Data Mining and Knowledge Discovery
Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries

Data Mining and Knowledge Discovery
Complete Mining of Frequent Patterns from Graphs: Mining Graph Data

Machine Learning
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
Finding Patterns in Three-Dimensional Graphs: Algorithms and Applications to Scientific Data Mining

IEEE Transactions on Knowledge and Data Engineering
Concise Representation of Frequent Patterns Based on Disjunction-Free Generators

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Mining All Non-derivable Frequent Itemsets

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Optimal Histograms with Quality Guarantees

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Local and Global Methods in Data Mining: Basic Techniques and Open Problems

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Pattern Detection and Discovery

Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery
Relational Association Rules: Getting WARMeR

Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery
Performance Guarantees for Hierarchical Clustering

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
How to quickly find a witness

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Selecting the right interestingness measure for association patterns

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering all most specific sentences

ACM Transactions on Database Systems (TODS)
Approximation schemes for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
On Computing Condensed Frequent Pattern Bases

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
A Theory of Inductive Query Answering

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Reliable Detection of Episodes in Event Sequences

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Efficient Data Mining for Maximal Frequent Subtrees

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
Efficient dynamic mining of constrained frequent sets

ACM Transactions on Database Systems (TODS)
BIDE: Efficient Mining of Frequent Closed Sequences

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
On clusterings: Good, bad and spectral

Journal of the ACM (JACM)
A local search approximation algorithm for k-means clustering

Computational Geometry: Theory and Applications - Special issue on the 18th annual symposium on computational geometry—SoCG2002
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
How Fast Is the k-Means Method?

Algorithmica
Summarizing itemset patterns: a profile-based approach

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
On the Computational Complexity of Optimal Multisplitting

Fundamenta Informaticae - Intelligent Systems
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

New Classification Method Based on Support-Significant Association Rules Algorithm

ICIC '07 Proceedings of the 3rd International Conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding interesting patterns from data is one of the most important problems in data mining and it has been studied actively for more than a decade. However, it is still largely open problem which patterns are interesting and which are not. The problem of detecting the interesting patterns (in a predefined class of patterns) has been attempted to solve by determining quality values for potentially interesting patterns and deciding a pattern to be interesting if its quality value (i.e., the interestingness of the pattern) is higher than a given threshold value. Again, it is very difficult to find a threshold value and a way to determine the quality values such that the collection of patterns with quality values greater than the threshold value would contain almost all truly interesting patterns and only few uninteresting ones. To enable more accurate characterization of interesting patterns, use of constraints to further prune the pattern collection has been proposed. However, most of the constrained pattern discovery research has been focused on structural constraints for the pattern collections and the patterns. We take a complementary approach and focus on constraining the quality values of the patterns. We propose quality value simplifications as a complementary approach to structural constraints on patterns. As a special case of the quality value simplifications, we consider discretizing the quality values. We analyze the worst-case error of certain discretization functions and give efficient discretization algorithms minimizing several loss functions. In addition to that, we show that the discretizations of the quality values can be used to obtain small approximate condensed representations for collections of interesting patterns. We evaluate the proposed condensation approach experimentally using frequent itemsets.