A similarity reinforcement algorithm for heterogeneous web pages

  • Authors:
  • Ning Liu;Jun Yan;Fengshan Bai;Benyu Zhang;Wensi Xi;Weiguo Fan;Zheng Chen;Lei Ji;Chenyong Hu;Wei-Ying Ma

  • Affiliations:
  • Department of Mathematical Science, Tsinghua University, Beijing, P.R. China;LMAM, Department of Information Science, School of Mathematical Science, Peking University, Beijing, P.R. China;Department of Mathematical Science, Tsinghua University, Beijing, P.R. China;Microsoft Research Asia, Beijing, P.R. China;Computer Science, Virginia Polytechnic Institute and State University;Computer Science, Virginia Polytechnic Institute and State University;Microsoft Research Asia, Beijing, P.R. China;Microsoft Research Asia, Beijing, P.R. China;Institute of Software, Lab for Internet Software Technologies, CAS, Beijing, P.R. China;Microsoft Research Asia, Beijing, P.R. China

  • Venue:
  • APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many machine learning and data mining algorithms crucially rely on the similarity metrics. However, most early research works such as Vector Space Model or Latent Semantic Index only used single relationship to measure the similarity of data objects. In this paper, we first use an Intra- and Inter- Type Relationship Matrix (IITRM) to represent a set of heterogeneous data objects and their inter-relationships. Then, we propose a novel similarity-calculating algorithm over the Inter- and Intra- Type Relationship Matrix. It tries to integrate information from heterogeneous sources to serve their purposes by iteratively computing. This algorithm can help detect latent relationships among heterogeneous data objects. Our new algorithm is based on the intuition that the intra-relationship should affect the inter-relationship, and vice versa. Experimental results on the MSN logs dataset show that our algorithm outperforms the traditional Cosine similarity.