Active learning based frequent itemset mining over the deep web

  • Authors:
  • Tantan Liu;Gagan Agrawal

  • Affiliations:
  • Department of Computer Science and Engineering, The Ohio State University, Columbus, 43210, USA;Department of Computer Science and Engineering, The Ohio State University, Columbus, 43210, USA

  • Venue:
  • ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In recent years, one mode of data dissemination has become extremely popular, which is the deep web. A key characteristics of deep web data sources is that data can only be accessed through the limited query interface they support. This paper develops a methodology for mining the deep web. Because these data sources cannot be accessed directly, thus, data mining must be performed based on sampling of the datasets. The samples, in turn, can only be obtained by querying the deep web databases with specific inputs. Unlike existing sampling based methods, which are typically applied on relational databases or streaming data, sampling costs, and not the computation or memory costs, are the dominant consideration in designing the algorithm.