Interactive pattern mining on hidden data: a sampling-based solution

Authors:
Mansurul Bhuiyan;Snehasis Mukhopadhyay;Mohammad Al Hasan
Affiliations:
Indiana University - Purdue University, Indianapolis (IUPUI), Indianapolis, IN, USA;Indiana University - Purdue University, Indianapolis (IUPUI), Indianapolis, IN, USA;Indiana University - Purdue University, Indianapolis (IUPUI), Indianapolis, IN, USA
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 26
Cited 0

SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Crawling the Hidden Web

Proceedings of the 27th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Revealing information while preserving privacy

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Selecting the right interestingness measure for association patterns

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints

Data Mining and Knowledge Discovery
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
Foundations of Cryptography: Volume 2, Basic Applications

Foundations of Cryptography: Volume 2, Basic Applications
Evaluating top-k queries over web-accessible databases

ACM Transactions on Database Systems (TODS)
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
Approximate Inverse Frequent Itemset Mining: Privacy, Complexity, and Approximation

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Discovering interesting patterns through user's interactive feedback

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A lattice-based approach to query-by-example spoken document retrieval

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Leveraging COUNT Information in Sampling Hidden Databases

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Approximating the number of frequent sets in dense data

Knowledge and Information Systems
Turbo-charging hidden database samplers with overflowing queries and skew reduction

Proceedings of the 13th International Conference on Extending Database Technology
Unbiased estimation of size and other aggregates over hidden web databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Shopping for products you don't know you need

Proceedings of the fourth ACM international conference on Web search and data mining
Query suggestion for E-commerce sites

Proceedings of the fourth ACM international conference on Web search and data mining
Unsupervised query segmentation using only query logs

Proceedings of the 20th international conference companion on World wide web
Tell me what i need to know: succinctly summarizing data with itemsets

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
MIME: a framework for interactive visual pattern mining

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Maximum entropy models and subjective interestingness: an application to tiles in binary databases

Data Mining and Knowledge Discovery
Don't be afraid of simpler patterns

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining frequent patterns from a hidden dataset is an important task with 43 various real-life applications. In this research, we propose a solution to this problem that is based on Markov Chain Monte Carlo (MCMC) sampling of frequent patterns. Instead of returning all the frequent patterns, the proposed paradigm returns a small set of randomly selected patterns so that the clandestinity of the dataset can be maintained. Our solution also allows interactive sampling, so that the sampled patterns can fulfill the user's requirement effectively. We show experimental results from several real life datasets to validate the capability and usefulness of our solution; in particular, we show examples that by using our proposed solution, an eCommerce marketplace can allow pattern mining on user session data without disclosing the data to the public; such a mining paradigm helps the sellers of the marketplace, which eventually boost the marketplace's own revenue.