A similarity reinforcement algorithm for heterogeneous web pages

Authors:
Ning Liu;Jun Yan;Fengshan Bai;Benyu Zhang;Wensi Xi;Weiguo Fan;Zheng Chen;Lei Ji;Chenyong Hu;Wei-Ying Ma
Affiliations:
Department of Mathematical Science, Tsinghua University, Beijing, P.R. China;LMAM, Department of Information Science, School of Mathematical Science, Peking University, Beijing, P.R. China;Department of Mathematical Science, Tsinghua University, Beijing, P.R. China;Microsoft Research Asia, Beijing, P.R. China;Computer Science, Virginia Polytechnic Institute and State University;Computer Science, Virginia Polytechnic Institute and State University;Microsoft Research Asia, Beijing, P.R. China;Microsoft Research Asia, Beijing, P.R. China;Institute of Software, Lab for Internet Software Technologies, CAS, Beijing, P.R. China;Microsoft Research Asia, Beijing, P.R. China
Venue:
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Year:
2005

Citing 21
Cited 1

Using latent semantic analysis to improve access to textual information

CHI '88 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Recommender systems

Communications of the ACM
Life, death, and lawfulness on the electronic frontier

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
An algorithmic framework for performing collaborative filtering

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Finding related pages in the World Wide Web

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Agglomerative clustering of a search engine query log

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Model-based feedback in the language modeling approach to information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Query clustering using user logs

ACM Transactions on Information Systems (TOIS)
Modern Information Retrieval

Modern Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Vulnerabilities in similarity search based systems

Proceedings of the eleventh international conference on Information and knowledge management
Mining the Web's Link Structure

Computer
Similarity-Based Operators and Query Optimization for Multimedia Database Systems

IDEAS '01 Proceedings of the International Database Engineering & Applications Symposium
Clustering and Identifying Temporal Trends in Document Databases

ADL '00 Proceedings of the IEEE Advances in Digital Libraries 2000
Correlation-based Document Clustering using Web Logs

HICSS '01 Proceedings of the 34th Annual Hawaii International Conference on System Sciences ( HICSS-34)-Volume 5 - Volume 5
ReCoM: reinforcement clustering of multi-type interrelated data objects

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Toward a unification of text and link analysis

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types

Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types
Automatic Information Organization and Retrieval.

Automatic Information Organization and Retrieval.

SimFusion: measuring similarity using unified relationship matrix

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many machine learning and data mining algorithms crucially rely on the similarity metrics. However, most early research works such as Vector Space Model or Latent Semantic Index only used single relationship to measure the similarity of data objects. In this paper, we first use an Intra- and Inter- Type Relationship Matrix (IITRM) to represent a set of heterogeneous data objects and their inter-relationships. Then, we propose a novel similarity-calculating algorithm over the Inter- and Intra- Type Relationship Matrix. It tries to integrate information from heterogeneous sources to serve their purposes by iteratively computing. This algorithm can help detect latent relationships among heterogeneous data objects. Our new algorithm is based on the intuition that the intra-relationship should affect the inter-relationship, and vice versa. Experimental results on the MSN logs dataset show that our algorithm outperforms the traditional Cosine similarity.