DL meets p2p – distributed document retrieval based on classification and content

  • Authors:
  • Wolf-Tilo Balke;Wolfgang Nejdl;Wolf Siberski;Uwe Thaden

  • Affiliations:
  • L3S and University of Hannover, Hannover;L3S and University of Hannover, Hannover;L3S and University of Hannover, Hannover;L3S and University of Hannover, Hannover

  • Venue:
  • ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Peer-to-peer architectures are a potentially powerful paradigm for retrieving documents over networks of digital libraries avoiding single points of failure by massive federation of (independent) information sources. Today sharing files over P2P infrastructures is already immensely successful, but restricted to simple metadata matching. But when it comes to the retrieval of complex documents, capabilities as provided by digital libraries are needed. Digital libraries have to cope with compound documents. Though some document parts (like embedded images) can efficiently be retrieved using metadata matching, the text-based information needs different methods like full text search. However, for effective querying of texts, also information like inverted document frequencies are essential. But due to the distributed characteristics of P2P networks such 'collection-wide' information poses severe problems, e.g. that central updates whenever changes in any document collection occur use up valuable bandwidth. We will present a novel indexing technique that allows to query using collection-wide information with respect to different classifications and show the effectiveness of our scheme for practical applications. We will in detail discuss our findings and present simulations for the scheme's efficiency and scalability.