Semantic indexing in structured peer-to-peer networks

  • Authors:
  • Ronaldo A Ferreira;Mehmet Koyutürk;Suresh Jagannathan;Ananth Grama

  • Affiliations:
  • Department of Computer Science, Federal University of Mato Grosso do Sul, Brazil;Department of Electrical Engineering & Computer Science, Case Western Reserve University, USA;Department of Computer Science, Purdue University, USA;Department of Computer Science, Purdue University, USA

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The past few years have seen tremendous advances in distributed storage infrastructure. Unstructured and structured overlay networks have been successfully used in a variety of applications, ranging from file-sharing to scientific data repositories. While unstructured networks benefit from low maintenance overhead, the associated search costs are high. On the other hand, structured networks have higher maintenance overheads, but facilitate bounded time search of installed keywords. When dealing with typical data sets, though, it is infeasible to install every possible search term as a keyword into the structured overlay. State-of-the art semantic indexing techniques have been successfully integrated into peer-to-peer (P2P) systems using semantic overlays. However, exiting approaches are based on the premise that the fundamental ingredient of semantic indexing, a semantic basis for the underlying data, is globally available, which is not likely to be the case in practice. Therefore, development of techniques to efficiently compute basis vectors for data distributed across peers is important for large-scale deployment of semantic indexing in P2P systems. In this paper, we present a novel structured overlay that integrates aspects of semantic indexing using non-orthogonal matrix decompositions, with the hash structure of the overlay. We adopt PROXIMUS, a recursive decomposition method for computing concise representations for binary data sets, to locally identify latent patterns in data distributed across peers. To enable efficient consolidation of patterns, we rely on distributed hash tables (DHT), commonly used in various applications in P2P networks. The discrete nature of non-orthogonal matrix decomposition is well suited to the binary key structure of DHTs, resulting in an indexing method, PMINER, that enables the network to deliver efficient and accurate responses to semantic queries. We present the algorithmic underpinnings of PMINER and demonstrate its excellent performance characteristics on real, as well as synthetic data sets.