Automatic information extraction from semi-structured Web pages by pattern discovery
Decision Support Systems - Web retrieval and mining
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Data extraction and label assignment for web databases
WWW '03 Proceedings of the 12th international conference on World Wide Web
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic information extraction from large websites
Journal of the ACM (JACM)
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
Towards domain-independent information extraction from web tables
Proceedings of the 16th international conference on World Wide Web
Hi-index | 0.00 |
A Web Information Extraction System based on label library is proposed for extracting information from data intensive web pages in this paper. It downloads dynamic web pages based on a knowledge database, changes them to XML documents after a preprocessing, mines data regions by using MDR repeated patterns discovery algorithm, recognizes their structure and extracts data from them through a novel hierarchic pattern recognition and data extraction algorithm based on label library, and stores the data into the knowledge database to support further information extraction. Experiments showed that the system has high precision and is adaptive to web pages in different domains and with different structures.