Multiple sections extraction using visual cue

Authors:
Derren Wong;Jer Lang Hong
Affiliations:
School of Computing and IT, Taylor's University, Malaysia;School of Computing and IT, Taylor's University, Malaysia
Venue:
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part V
Year:
2012

Citing 11
Cited 0

Mining data records in Web pages

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fully automatic wrapper generation for search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
Web data extraction based on partial tree alignment

WWW '05 Proceedings of the 14th international conference on World Wide Web
ViPER: augmenting automatic information extraction with visual perceptions

Proceedings of the 14th ACM international conference on Information and knowledge management
Automatic extraction of dynamic record sections from search engine result pages

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Extracting data records from the web using tag path clustering

Proceedings of the 18th international conference on World wide web
ODE: Ontology-assisted data extraction

ACM Transactions on Database Systems (TODS)
Information extraction for search engines using fast heuristic techniques

Data & Knowledge Engineering
ViDE: A Vision-Based Approach for Deep Web Data Extraction

IEEE Transactions on Knowledge and Data Engineering
WMS-extracting multiple sections data records from search engine results pages

Proceedings of the 2010 ACM Symposium on Applied Computing
Data Extraction for Deep Web Using WordNet

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current wrappers are unable to extract multiple sections data records from search engine results pages as sections usually have complicated layout and structure. Extracting data from search engine results pages is important for meta search engine applications and comparative shopping lists evaluation. In this paper, we present a novel data extraction technique which uses visual cue to check for the regularity of structure in multiple sections data records. Our findings show that though there are no regularity in structure for multiple sections data records, there is regularity in structure for multiple sections data records. Our technique is novel and can serve as a model for future multiple sections data extraction and it will be useful for meta search engine application, which needs an accurate tool to locate its source of information.