Formal concept analysis approach for data extraction from a limited deep web database

Authors:
Zhuo Zhang;Juan Du;Liming Wang
Affiliations:
School of Information Engineering, ZhengZhou University, ZhengZhou, China 450001;Information Technology Engineering, Yellow River Conservancy Technical Institute, Kaifeng, China 475003;School of Information Engineering, ZhengZhou University, ZhengZhou, China 450001
Venue:
Journal of Intelligent Information Systems
Year:
2013

Citing 14
Cited 0

Query Selection Techniques for Efficient Crawling of Structured Web Sources

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
FCA for contextual semantic navigation and information retrieval in heterogeneous information systems

DEXA '07 Proceedings of the 18th International Conference on Database and Expert Systems Applications
Concept Similarity and Related Categories in SearchSleuth

ICCS '08 Proceedings of the 16th international conference on Conceptual Structures: Knowledge Visualization and Reasoning
A Topic-Specific Web Crawler with Concept Similarity Context Graph Based on FCA

ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Artificial Intelligence
Google's Deep Web crawl

Proceedings of the VLDB Endowment
Leveraging COUNT Information in Sampling Hidden Databases

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Crawling Deep Web Using a New Set Covering Algorithm

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
ViDE: A Vision-Based Approach for Deep Web Data Extraction

IEEE Transactions on Knowledge and Data Engineering
Conceptual knowledge retrieval with FooCA: improving web search engine results with contexts and concept hierarchies

ICDM'06 Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining
Efficient deep web crawling using reinforcement learning

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
OPAL: automated form understanding for the deep web

Proceedings of the 21st international conference on World Wide Web
Data Extraction for Deep Web Using WordNet

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Semantic ranking of web pages based on formal concept analysis

Journal of Systems and Software
E-FFC: an enhanced form-focused crawler for domain-specific deep web databases

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Few studies have addressed the problem of extracting data from a limited deep web database. We apply formal concept analysis to this problem and propose a novel algorithm called EdaliwdbFCA. Before a query Y is sent, the algorithm analyzes the local formal context KL, which consists of the latest extracted data, and predicts the size of the query results according to the cardinality of the extent X of the formal concept (X,Y) derived from KL. Thus, it can be determined in advance if Y is a query or not. Candidate query concepts are dynamically generated from the lower cover of the current concept (X,Y). Therefore, this method avoids building of concrete concept lattices during extraction. Moreover, two pruning rules are adopted to reduce redundant queries. Experiments on controlled data sets and real applications were performed. The results confirm that the algorithm theories are correct and it can be effectively applied in the real world.