Adaptive record extraction from web pages

Authors:
Justin Park;Denilson Barbosa
Affiliations:
University of Calgary, Calgary, AB, Canada;University of Calgary, Calgary, AB, Canada
Venue:
Proceedings of the 16th international conference on World Wide Web
Year:
2007

Citing 7
Cited 3

Tree pattern matching

Pattern matching algorithms
A brief survey of web data extraction tools

ACM SIGMOD Record
RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
Mining data records in Web pages

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic web news extraction using tree edit distance

Proceedings of the 13th international conference on World Wide Web
Fully automatic wrapper generation for search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
Web data extraction based on partial tree alignment

WWW '05 Proceedings of the 14th international conference on World Wide Web

Facilitating wrapper generation with page analysis

ISI'09 Proceedings of the 2009 IEEE international conference on Intelligence and security informatics
Online social network profile data extraction for vulnerability analysis

International Journal of Internet Technology and Secured Transactions
TEX: An efficient and effective unsupervised Web information extractor

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe an adaptive method for extracting records from web pages. Our algorithm combines a weighted tree matching metric with clustering for obtaining data extraction patterns.We compare our method experimentally to the state-of-the-art, and show that our approach is very competitive for rigidly-structured records (such as product descriptions) and far superior for loosely-structured records (such as entrieson blogs).