Active learning based frequent itemset mining over the deep web

Authors:
Tantan Liu;Gagan Agrawal
Affiliations:
Department of Computer Science and Engineering, The Ohio State University, Columbus, 43210, USA;Department of Computer Science and Engineering, The Ohio State University, Columbus, 43210, USA
Venue:
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Year:
2011

Citing 0
Cited 1

Stratified k-means clustering over a deep web data source

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, one mode of data dissemination has become extremely popular, which is the deep web. A key characteristics of deep web data sources is that data can only be accessed through the limited query interface they support. This paper develops a methodology for mining the deep web. Because these data sources cannot be accessed directly, thus, data mining must be performed based on sampling of the datasets. The samples, in turn, can only be obtained by querying the deep web databases with specific inputs. Unlike existing sampling based methods, which are typically applied on relational databases or streaming data, sampling costs, and not the computation or memory costs, are the dominant consideration in designing the algorithm.