Exploiting the hierarchical structure for link analysis
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
Mining communities on the web using a max-flow and a site-oriented framework
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Hi-index | 0.00 |
In recent years, several information retrieval methodsusing information about the Web-links are developed, suchas HITS and Trawling. In order to analyze the Web-linksdividing into links inside each Web site (local-links) andlinks between Web sites (global-links) for the informationretrieval, it is required that a proper model of the Web site,a phrase used ambiguously in daily life. In the existing researches,a Web server is used as a model of the Web site.This idea works relatively well in case that a Web site correspondsto a server such as public Web sites, but workspoorly in case that multiple Web sites correspond to a serversuch as private Web sites on rental Web servers. In this paper,we propose a new model of the Web site, "directory-basedsite" to handle typical private sites, and a methodto identify them using information about the URL and theWeb-links. We verify the method can approximately identifyabout 66% of over 110 thousands servers whether eachserver has multiple directory-based sites or not, and extractover 500 thousands of directory-based sites and 4 millionglobal-links by computational experiments using jp-domainURLs and Web-links data contains over 23 millionURLs and 100 million Web-links, collected from July to August2000, by Toyoda and Kitsuregawa. We also proposea new framework of the Web-links based information retrievalthat uses the directory-based sites and the global-linksinstead of the Web pages and the whole Web-links respectively,and examine effectiveness of our framework bycomparing a result of Trawling on our framework to one onthe existing framework.