A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Extracting Patterns and Relations from the World Wide Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
On the complexity of schema inference from web pages in the presence of nullable data attributes
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Fully automatic wrapper generation for search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
Corroborate and learn facts from the web
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Open information extraction from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Unsupervised named-entity extraction from the Web: An experimental study
Artificial Intelligence
Hi-index | 0.00 |
Simplifying the key tasks of search engine users by directly retrieving to them structured knowledge according to their queries is attracting much attention from both industry and academia. A bottleneck of this challenging problem is how to extract the structured knowledge from the noisy and complex Web scale websites automatically. In this paper, we propose an unsupervised automatic wrapper induction algorithm, named as Scalable Knowledge Extractor from webSites (SKES). SKES induces the wrapper in a divide and conquer mode, i.e., it divides the general wrapper into several sub-wrappers to learn from the data independently. Moreover, through employing techniques such as tag path representation of Web pages, SKES is verified to be efficient and noise-tolerant by the experimental results. Furthermore, based on our automatically extracted knowledge, we also built a prototype to serve structured knowledge to end users for simplifying their key search tasks. Very positive feedbacks were received on the prototype.