Information extraction for search engines using fast heuristic techniques
Data & Knowledge Engineering
Web data extracion using visual features
Proceedings of the International Conference and Workshop on Emerging Trends in Technology
Mining process models with prime invisible tasks
Data & Knowledge Engineering
On-line web database integration
Proceedings of the International Conference on Management of Emergent Digital EcoSystems
HyLiEn: a hybrid approach to general list extraction on the web
Proceedings of the 20th international conference companion on World wide web
Foundations and Trends in Information Retrieval
Extracting general lists from web documents: a hybrid approach
IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I
Towards a spatial instance learning method for deep web pages
ICDM'11 Proceedings of the 11th international conference on Advances in data mining: applications and theoretical aspects
An indent shape based approach for web lists mining
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Concluding pattern of web page based on string pattern matching
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Towards a unified solution: data record region detection and segmentation
Proceedings of the 20th ACM international conference on Information and knowledge management
SILA: a spatial instance learning approach for deep webpages
Proceedings of the 20th ACM international conference on Information and knowledge management
Extracting data records from query result pages based on visual features
BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
Data extraction for search engine using safe matching
AI'11 Proceedings of the 24th international conference on Advances in Artificial Intelligence
TEX: An efficient and effective unsupervised Web information extractor
Knowledge-Based Systems
Multiple sections extraction using visual cue
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part V
Towards web-scale structured web data extraction
Proceedings of the sixth ACM international conference on Web search and data mining
Cluster-based page segmentation-a fast and precise method for web page pre-processing
Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
Visually extracting data records from the deep web
Proceedings of the 22nd international conference on World Wide Web companion
Robust detection of semi-structured web records using a DOM structure-knowledge-driven model
ACM Transactions on the Web (TWEB)
The parallel path framework for entity discovery on the web
ACM Transactions on the Web (TWEB)
A learning classifier-based approach to aligning data items and labels
BNCOD'13 Proceedings of the 29th British National conference on Big Data
Architecture specification of rule-based deep web crawler with indexer
International Journal of Knowledge and Web Intelligence
Formal concept analysis approach for data extraction from a limited deep web database
Journal of Intelligent Information Systems
Hi-index | 0.00 |
Deep Web contents are accessed by queries submitted to Web databases and the returned data records are enwrapped in dynamically generated Web pages (they will be called deep Web pages in this paper). Extracting structured data from deep Web pages is a challenging problem due to the underlying intricate structures of such pages. Until now, a large number of techniques have been proposed to address this problem, but all of them have inherent limitations because they are Web-page-programming-language-dependent. As the popular two-dimensional media, the contents on Web pages are always displayed regularly for users to browse. This motivates us to seek a different way for deep Web data extraction to overcome the limitations of previous works by utilizing some interesting common visual features on the deep Web pages. In this paper, a novel vision-based approach that is Web-page-programming-language-independent is proposed. This approach primarily utilizes the visual features on the deep Web pages to implement deep Web data extraction, including data record extraction and data item extraction. We also propose a new evaluation measure revision to capture the amount of human effort needed to produce perfect extraction. Our experiments on a large set of Web databases show that the proposed vision-based approach is highly effective for deep Web data extraction.