Node similarity in networked information spaces

  • Authors:
  • Wangzhong Lu;Jeannette Janssen;Evangelos Milios;Nathalie Japkowicz

  • Affiliations:
  • Dalhousie University, Halifax, Nova Scotia, Canada, B3H 3J5;Dalhousie University, Halifax, Nova Scotia, Canada, B3H 3J5;Dalhousie University, Halifax, Nova Scotia, Canada, B3H 3J5;Dalhousie University, Halifax, Nova Scotia, Canada, B3H 3J5

  • Venue:
  • CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Networked information spaces contain information entities, corresponding to nodes, which are connected by associations, corresponding to links in the network. Examples of networked information spaces are: the World Wide Web, where information entities are web pages, and associations are hyperlinks: the scientific literature, where information entities are articles and associations are references to other articles. Similarity between information entities in a networked information space can be defined not only based on the content of the information entities, but also based on the connectivity established by the associations present. This paper explores the definition of similarity based on connectivity only, and proposes several algorithms for this purpose. Our metrics take advantage of the local neighborhoods of the nodes in the networked information space. Therefore, explicit availability of the networked information space is not required, as long as a query engine is available for following links and extracting the necessary local neighbourhoods for similarity estimation. Two variations of similarity estimation between two nodes are described, one based on the separate local neighbourhoods of the nodes, and another based on the joint local neighbourhood expanded from both nodes at the same time. The algorithms are implemented and evaluated on the citation graph of computer science. The immediate application of this work is in finding papers similar to a given paper in a digital library, but they are also applicable to other networked information spaces, such as the Web.