Single-pass clustering for peer-to-peer information retrieval: the effect of document ordering

Authors:
Iraklis A. Klampanos;Joemon M. Jose;C. J. "Keith" van Rijsbergen
Affiliations:
University of Glasgow, Scotland;University of Glasgow, Scotland;University of Glasgow, Scotland
Venue:
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Year:
2006

Citing 6
Cited 3

Early measurements of a cluster-based architecture for P2P systems

IMW '01 Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement
Information Retrieval

Information Retrieval
Does WT10g look like the web?

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Content-based retrieval in hybrid peer-to-peer networks

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
An architecture for information retrieval over semi-collaborating Peer-to-Peer networks

Proceedings of the 2004 ACM symposium on Applied computing
A suite of testbeds for the realistic evaluation of peer-to-peer information retrieval systems

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research

Patch clustering for massive data sets

Neurocomputing
Distributed data clustering in multi-dimensional peer-to-peer networks

ADC '10 Proceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 104
On-line single-pass clustering based on diffusion maps

NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Document clustering has been a particularly active research field within the Information Retrieval (IR) community. Among the numerous clustering algorithms proposed, single-pass clustering stands out in terms of both time and space efficiency. However, it is generally acknowledged that single-pass clustering has a major defect, namely its output depends on the order in which documents are presented. Building on our previous work, and having identified single-pass clustering as potentially useful for P2P IR, we study the extent to which this is true in practical terms. We do so by experimenting with two large web-based testbeds, which are suitable for Peer-to-Peer IR evaluation. The results of our study show that document ordering does not practically matter for single-pass clustering.