Extracting Related Words from Anchor Text Clusters by Focusing on the Page Designer's Intention

  • Authors:
  • Jianquan Liu;Hanxiong Chen;Kazutaka Furuse;Nobuo Ohbo

  • Affiliations:
  • Department of Computer Science, Graduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki-ken, Japan 305-8577;Department of Computer Science, Graduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki-ken, Japan 305-8577;Department of Computer Science, Graduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki-ken, Japan 305-8577;Department of Computer Science, Graduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki-ken, Japan 305-8577

  • Venue:
  • DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Approaches for extracting related words (terms) by co-occurrence work poorly sometimes. Two words frequently co-occurring in the same documents are considered related. However, they may not relate at all because they would have no common meanings nor similar semantics. We address this problem by considering the page designer's intention and propose a new model to extract related words. Our approach is based on the idea that the web page designers usually make the correlative hyperlinks appear in close zone on the browser. We developed a browser-based crawler to collect "geographically" near hyperlinks, then by clustering these hyperlinks based on their pixel coordinates, we extract related words which can well reflect the designer's intention. Experimental results show that our method can represent the intention of the web page designer in extremely high precision. Moreover, the experiments indicate that our extracting method can obtain related words in a high average precision.