RoadRunner: automatic data extraction from data-intensive web sites
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
DEByE - Date extraction by example
Data & Knowledge Engineering
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Data extraction and label assignment for web databases
WWW '03 Proceedings of the 12th international conference on World Wide Web
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
SG-WRAP: A Schema-Guided Wrapper Generator
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Structured databases on the web: observations and implications
ACM SIGMOD Record
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
Extracting content structure for web pages based on visual representation
APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
Hi-index | 0.00 |
With the rapid development of Internet, data sources on deep web store a large number of high-quality structured data, which demands the development of structured data extraction method. But the existing methods focus on data rather than structure, and some of them are difficult to maintain. To resolve these problems, a complete and effective method supporting data extraction and schema recognition is proposed in this paper. To extract data, a novel algorithm based on clustering is adopted, which is also effective when faced complex data and excessive noise. And a simple extraction rule model is defined to resolve the problem of maintenance. In addition, it does deep mining on result schema recognition. At last, experiments show satisfactory results.