The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Effective site finding using link anchor information
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning Approach for Homepage Finding Task
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Growing parallel paths for entity-page discovery
Proceedings of the 20th international conference companion on World wide web
WINACS: construction and analysis of web-based computer science information networks
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Construction and analysis of web-based computer science information networks
RSFDGrC'11 Proceedings of the 13th international conference on Rough sets, fuzzy sets, data mining and granular computing
The parallel path framework for entity discovery on the web
ACM Transactions on the Web (TWEB)
Hi-index | 0.00 |
In this paper we propose a new knowledge management task which aims to map Web pages to their corresponding records in a structured database. For example, the DBLP database contains records for many computer scientists, and most of these persons have public Web pages; if we can map the database record with the appropriate Web page then the new information could be used to further describe the person's database record. To accomplish this goal we employ link paths which contain anchor texts from multiple paths through the Web ending at the Web page in question. We hypothesize that the information from these link paths can be used to generate an accurate Web page to database record mapping. Experiments on two large, real world data sets, DBLP and IMDB for the structured data and computer science faculty members' Web pages and official movie homepages for the Web page data, show that our method does provide an accurate mapping. Finally, we conclude by issuing a call for further research on this promising new task.