Automatic information extraction from web pages

Authors:
Budi Rahardjo;Roland H. C. Yap
Affiliations:
National Univ. of Singapore, Republic of Singapore;National Univ. of Singapore, Republic of Singapore
Venue:
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2001

Citing 3
Cited 8

A hierarchical approach to wrapper induction

Proceedings of the third annual conference on Autonomous Agents
Semi-Automatic Wrapper Generation for Internet Information Sources

COOPIS '97 Proceedings of the Second IFCIS International Conference on Cooperative Information Systems
Tracking and viewing changes on the web

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference

Variations in use of meta tag descriptions by web pages in different languages

Information Processing and Management: an International Journal
Query-related data extraction of hidden web documents

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A two-phase sampling technique for information extraction from hidden web databases

Proceedings of the 6th annual ACM international workshop on Web information and data management
Sampling, information extraction and summarisation of hidden web databases

Data & Knowledge Engineering - Special issue: WIDM 2004
Information categorization in web pages and sites

Web Intelligence and Agent Systems
CCReSD: concept-based categorisation of Hidden Web databases

International Journal of High Performance Computing and Networking
A constrained crawling approach and its application to a specialised search engine

International Journal of Information and Communication Technology
A TNATS approach to hidden web documents

ICDCIT'04 Proceedings of the First international conference on Distributed Computing and Internet Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many web pages have implicit structure. In this paper, we show the feasibility of automatically extracting data from web pages by using approximate matching techniques. This can be applied to generate automatic wrappers or to notify/display web page differences, web page change monitoring, etc.