IEPAD: information extraction based on pattern discovery
Proceedings of the 10th international conference on World Wide Web
DEByE - Date extraction by example
Data & Knowledge Engineering
Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fully automatic wrapper generation for search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
Thresher: automating the unwrapping of semantic content from the World Wide Web
WWW '05 Proceedings of the 14th international conference on World Wide Web
ViPER: augmenting automatic information extraction with visual perceptions
Proceedings of the 14th ACM international conference on Information and knowledge management
Automatic extraction of dynamic record sections from search engine result pages
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Webpage segmentation for extracting images and their surrounding contextual information
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Information extraction for search engines using fast heuristic techniques
Data & Knowledge Engineering
Multiple sections extraction using visual cue
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part V
Hi-index | 0.00 |
In this paper, we develop an automatic wrapper for the extraction of multiple sections data records from search engine results pages. In the Information Extraction world, less attention has been focused on the development of wrappers for the extraction of multiple sections data records. This is evidenced by the fact that there is only one automatic wrapper, MSE developed for this purpose. Using the separation distance of data records and sections, MSE is able to distinguish sections and data records and extract them from search engine results pages. In this study, our approach is the use of DOM tree properties to develop an adaptive search method which is able to detect, differentiate, and partition sections and data records. The multiple sections data records labeled are used to pass through a few filtering stages, each filter is designed to filter out a particular group of irrelevant data until one data region containing the relevant records is found. Our filtering rules are designed based on visual cue such as text and image size obtained from the browser rendering engine. Experimental results show that our wrapper is able to obtain better results than the currently available MSE wrapper.