Algorithms for clustering data
Algorithms for clustering data
The discrete p-maxian location problem
Computers and Operations Research
Class-based n-gram models of natural language
Computational Linguistics
The use of MMR, diversity-based reranking for reordering documents and producing summaries
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Finding subsets maximizing minimum structures
Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
What Makes Patterns Interesting in Knowledge Discovery Systems
IEEE Transactions on Knowledge and Data Engineering
Mining All Non-derivable Frequent Itemsets
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Evaluating Top-k Selection Queries
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Selecting the right interestingness measure for association patterns
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Top.K Frequent Closed Patterns without Minimum Support
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Approximating a collection of frequent sets
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Interestingness of frequent itemsets using Bayesian networks as background knowledge
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Active feedback in ad hoc information retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Discovering evolutionary theme patterns from text: an exploration of temporal text mining
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Summarizing itemset patterns: a profile-based approach
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining compressed frequent-pattern sets
VLDB '05 Proceedings of the 31st international conference on Very large data bases
C-Miner: Mining Block Correlations in Storage Systems
FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Approximation algorithms for maximum dispersion
Operations Research Letters
From frequent itemsets to semantically meaningful visual patterns
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Community systems research at Yahoo!
ACM SIGMOD Record
Efficiently answering top-k typicality queries on large databases
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
On profiling blogs with representative entries
Proceedings of the second workshop on Analytics for noisy unstructured text data
Effective and efficient itemset pattern summarization: regression-based approaches
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Sliding-window top-k queries on uncertain streams
Proceedings of the VLDB Endowment
Representative entry selection for profiling blogs
Proceedings of the 17th ACM conference on Information and knowledge management
It takes variety to make a world: diversification in recommender systems
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Top-k typicality queries and efficient query answering methods on large databases
The VLDB Journal — The International Journal on Very Large Data Bases
Cartesian contour: a concise representation for a collection of frequent sets
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining problem-solving strategies from HCI data
ACM Transactions on Computer-Human Interaction (TOCHI)
Block interaction: a generative summarization scheme for frequent patterns
Proceedings of the ACM SIGKDD Workshop on Useful Patterns
Sliding-window top-k queries on uncertain streams
The VLDB Journal — The International Journal on Very Large Data Bases
Functional proteomic pattern identification under low dose ionizing radiation
Artificial Intelligence in Medicine
Making interval-based clustering rank-aware
Proceedings of the 14th International Conference on Extending Database Technology
A novel evolutionary method to search interesting association rules by keywords
Expert Systems with Applications: An International Journal
Discovering patterns for prognostics: a case study in prognostics of train wheels
IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I
Searching interesting association rules based on evolutionary computation
PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Finding minimum representative pattern sets
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A pattern discovery model for effective text mining
MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
A bayesian scoring technique for mining predictive and non-spurious rules
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Adaptive Study Design Through Semantic Association Rule Analysis
International Journal of Software Science and Computational Intelligence
Diversity maximization under matroid constraints
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Redundancy-aware maximal cliques
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Anytime algorithms for mining groups with maximum coverage
AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Hi-index | 0.00 |
Observed in many applications, there is a potential need of extracting a small set of frequent patterns having not only high significance but also low redundancy. The significance is usually defined by the context of applications. Previous studies have been concentrating on how to compute top-k significant patterns or how to remove redundancy among patterns separately. There is limited work on finding those top-k patterns which demonstrate high-significance and low-redundancy simultaneously.In this paper, we study the problem of extracting redundancy-aware top-k patterns from a large collection of frequent patterns. We first examine the evaluation functions for measuring the combined significance of a pattern set and propose the MMS (Maximal Marginal Significance) as the problem formulation. The problem is known as NP-hard. We further present a greedy algorithm which approximates the optimal solution with performance bound O(log k) (with conditions on redundancy), where k is the number of reported patterns. The direct usage of redundancy-aware top-k patterns is illustrated through two real applications: disk block prefetch and document theme extraction. Our method can also be applied to processing redundancy-aware top-k queries in traditional database.