Single-pass clustering for peer-to-peer information retrieval: the effect of document ordering

  • Authors:
  • Iraklis A. Klampanos;Joemon M. Jose;C. J. "Keith" van Rijsbergen

  • Affiliations:
  • University of Glasgow, Scotland;University of Glasgow, Scotland;University of Glasgow, Scotland

  • Venue:
  • InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Document clustering has been a particularly active research field within the Information Retrieval (IR) community. Among the numerous clustering algorithms proposed, single-pass clustering stands out in terms of both time and space efficiency. However, it is generally acknowledged that single-pass clustering has a major defect, namely its output depends on the order in which documents are presented. Building on our previous work, and having identified single-pass clustering as potentially useful for P2P IR, we study the extent to which this is true in practical terms. We do so by experimenting with two large web-based testbeds, which are suitable for Peer-to-Peer IR evaluation. The results of our study show that document ordering does not practically matter for single-pass clustering.