Extracting redundancy-aware top-k patterns

Authors:
Dong Xin;Hong Cheng;Xifeng Yan;Jiawei Han
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2006

Citing 18
Cited 26

Algorithms for clustering data

Algorithms for clustering data
The discrete p-maxian location problem

Computers and Operations Research
Class-based n-gram models of natural language

Computational Linguistics
The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Finding subsets maximizing minimum structures

Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
What Makes Patterns Interesting in Knowledge Discovery Systems

IEEE Transactions on Knowledge and Data Engineering
Mining All Non-derivable Frequent Itemsets

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Evaluating Top-k Selection Queries

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Selecting the right interestingness measure for association patterns

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Top.K Frequent Closed Patterns without Minimum Support

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Approximating a collection of frequent sets

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Interestingness of frequent itemsets using Bayesian networks as background knowledge

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Active feedback in ad hoc information retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Discovering evolutionary theme patterns from text: an exploration of temporal text mining

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Summarizing itemset patterns: a profile-based approach

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining compressed frequent-pattern sets

VLDB '05 Proceedings of the 31st international conference on Very large data bases
C-Miner: Mining Block Correlations in Storage Systems

FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Approximation algorithms for maximum dispersion

Operations Research Letters

From frequent itemsets to semantically meaningful visual patterns

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Community systems research at Yahoo!

ACM SIGMOD Record
Efficiently answering top-k typicality queries on large databases

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
On profiling blogs with representative entries

Proceedings of the second workshop on Analytics for noisy unstructured text data
Effective and efficient itemset pattern summarization: regression-based approaches

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Sliding-window top-k queries on uncertain streams

Proceedings of the VLDB Endowment
Representative entry selection for profiling blogs

Proceedings of the 17th ACM conference on Information and knowledge management
It takes variety to make a world: diversification in recommender systems

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Top-k typicality queries and efficient query answering methods on large databases

The VLDB Journal — The International Journal on Very Large Data Bases
Cartesian contour: a concise representation for a collection of frequent sets

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining problem-solving strategies from HCI data

ACM Transactions on Computer-Human Interaction (TOCHI)
Semantic-based pruning of redundant and uninteresting frequent geographic patterns

Geoinformatica
Block interaction: a generative summarization scheme for frequent patterns

Proceedings of the ACM SIGKDD Workshop on Useful Patterns
Sliding-window top-k queries on uncertain streams

The VLDB Journal — The International Journal on Very Large Data Bases
Functional proteomic pattern identification under low dose ionizing radiation

Artificial Intelligence in Medicine
Making interval-based clustering rank-aware

Proceedings of the 14th International Conference on Extending Database Technology
A novel evolutionary method to search interesting association rules by keywords

Expert Systems with Applications: An International Journal
Discovering patterns for prognostics: a case study in prognostics of train wheels

IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I
Searching interesting association rules based on evolutionary computation

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Finding minimum representative pattern sets

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A pattern discovery model for effective text mining

MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
A bayesian scoring technique for mining predictive and non-spurious rules

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Adaptive Study Design Through Semantic Association Rule Analysis

International Journal of Software Science and Computational Intelligence
Diversity maximization under matroid constraints

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Redundancy-aware maximal cliques

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Anytime algorithms for mining groups with maximum coverage

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134

Quantified Score

Hi-index	0.00

Visualization

Abstract

Observed in many applications, there is a potential need of extracting a small set of frequent patterns having not only high significance but also low redundancy. The significance is usually defined by the context of applications. Previous studies have been concentrating on how to compute top-k significant patterns or how to remove redundancy among patterns separately. There is limited work on finding those top-k patterns which demonstrate high-significance and low-redundancy simultaneously.In this paper, we study the problem of extracting redundancy-aware top-k patterns from a large collection of frequent patterns. We first examine the evaluation functions for measuring the combined significance of a pattern set and propose the MMS (Maximal Marginal Significance) as the problem formulation. The problem is known as NP-hard. We further present a greedy algorithm which approximates the optimal solution with performance bound O(log k) (with conditions on redundancy), where k is the number of reported patterns. The direct usage of redundancy-aware top-k patterns is illustrated through two real applications: disk block prefetch and document theme extraction. Our method can also be applied to processing redundancy-aware top-k queries in traditional database.