Analysis and improvement of HITS algorithm for detecting Web communities

  • Authors:
  • Saeko Nomura;Satoshi Oyama;Tetsuo Hayamizu;Toru Ishida

  • Affiliations:
  • Department of Social Informatics, Kyoto University, Kyoto, 606-8501 Japan;Department of Social Informatics, Kyoto University, Kyoto, 606-8501 Japan;Department of Social Informatics, Kyoto University, Kyoto, 606-8501 Japan;Department of Social Informatics, Kyoto University, Kyoto, 606-8501 Japan

  • Venue:
  • Systems and Computers in Japan
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper discusses Kleinberg's HITS algorithm (hyperlink-induced topic search) that extracts the Web community by Web inherent hyperlink analysis. The problems of the algorithm are analyzed and an improvement is proposed. For this purpose, a tool (Link Viewer) that visualizes the operation process of HITS algorithm was developed. The analysis revealed the following problem of the HITS algorithm: when there exists a page in the base set which is not related to the original topic at all and has a dense link structure, it is impossible to extract the Web community (authority and hub) matched to the original topic (topic drift problem). The authors focused only on the link analysis, and proposed the following modifications: (1) a technique in the eigenvalue calculation to consider the projection on the root subspace; (2) a technique for iterative calculation by extracting only the page from the base set which has link relations to multiple pages in the root set. A technique combining (1) and (2) is also considered. As a result, the topic drift problem is avoided for any topic with a relatively small amount of computation, and the HITS algorithm is improved by using the link information. © 2004 Wiley Periodicals, Inc. Syst Comp Jpn, 35(13): 32–42, 2004; Published online in Wiley InterScience (). DOI 10.1002/scj.10425