A hierarchical approach to wrapper induction
Proceedings of the third annual conference on Autonomous Agents
ACM Computing Surveys (CSUR)
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
A Fully Automated Object Extraction System for the World Wide Web
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fully automatic wrapper generation for search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
Experiences in crawling deep web in the context of local search
Proceedings of the 2nd international workshop on Geographic information retrieval
Automatically constructing wrappers for effective and efficient web information extraction
Automatically constructing wrappers for effective and efficient web information extraction
EMTAN: a web-based multi-agent system architecture for input automation
AMT'11 Proceedings of the 7th international conference on Active media technology
Hi-index | 0.00 |
It is essential for Web applications such as e-commerce portals to enrich their existing content offerings by aggregating relevant structured data (e.g., product reviews) from external Web resources. To meet this goal, in this paper, we present an algorithm for automatically extracting data records from Web pages. The algorithm uses a robust string matching technique for accurately identifying the records in the Webpage. Our experiments on diverse datasets (including datasets from third-party research projects) show that the proposed algorithm is highly effective and performs considerably better than two other state-of-the-art automatic data extraction systems. We made the proposed system publicly accessible in order for the readers to evaluate it.