Formal concept analysis approach for data extraction from a limited deep web database

  • Authors:
  • Zhuo Zhang;Juan Du;Liming Wang

  • Affiliations:
  • School of Information Engineering, ZhengZhou University, ZhengZhou, China 450001;Information Technology Engineering, Yellow River Conservancy Technical Institute, Kaifeng, China 475003;School of Information Engineering, ZhengZhou University, ZhengZhou, China 450001

  • Venue:
  • Journal of Intelligent Information Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Few studies have addressed the problem of extracting data from a limited deep web database. We apply formal concept analysis to this problem and propose a novel algorithm called EdaliwdbFCA. Before a query Y is sent, the algorithm analyzes the local formal context KL, which consists of the latest extracted data, and predicts the size of the query results according to the cardinality of the extent X of the formal concept (X,Y) derived from KL. Thus, it can be determined in advance if Y is a query or not. Candidate query concepts are dynamically generated from the lower cover of the current concept (X,Y). Therefore, this method avoids building of concrete concept lattices during extraction. Moreover, two pruning rules are adopted to reduce redundant queries. Experiments on controlled data sets and real applications were performed. The results confirm that the algorithm theories are correct and it can be effectively applied in the real world.