Studying the clustering paradox and scalability of search in highly distributed environments

  • Authors:
  • Weimao Ke;Javed Mostafa

  • Affiliations:
  • Drexel University, Philadelphia, PA;University of North Carolina at Chapel Hill, Chapel Hill, NC

  • Venue:
  • ACM Transactions on Information Systems (TOIS)
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the ubiquitous production, distribution and consumption of information, today's digital environments such as the Web are increasingly large and decentralized. It is hardly possible to obtain central control over information collections and systems in these environments. Searching for information in these information spaces has brought about problems beyond traditional boundaries of information retrieval (IR) research. This article addresses one important aspect of scalability challenges facing information retrieval models and investigates a decentralized, organic view of information systems pertaining to search in large-scale networks. Drawing on observations from earlier studies, we conduct a series of experiments on decentralized searches in large-scale networked information spaces. Results show that how distributed systems interconnect is crucial to retrieval performance and scalability of searching. Particularly, in various experimental settings and retrieval tasks, we find a consistent phenomenon, namely, the Clustering Paradox, in which the level of network clustering (semantic overlay) imposes a scalability limit. Scalable searches are well supported by a specific, balanced level of network clustering emerging from local system interconnectivity. Departure from that level, either stronger or weaker clustering, leads to search performance degradation, which is dramatic in large-scale networks.