Achieving high precisions with peer-to-peer is possible!

  • Authors:
  • Judith Winter;Gerold Kühne

  • Affiliations:
  • University of Applied Science, Frankfurt, Germany;University of Applied Science, Frankfurt, Germany

  • Venue:
  • INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Until previously, centralized stand-alone solutions had no problem coping with the load of storing, indexing and searching the small test collections used for evaluating search results at INEX. However, searching the new large-scale Wikipedia collection of 2009 requires much more resources such as processing power, RAM, and index space. It is hence more important than ever to regard efficiency issues when performing XML-Retrieval tasks on such a big collection. On the other hand, the rich markup of the new collection is an opportunity to exploit the given structure and obtain a more efficient search. This paper describes our experiments using distributed search techniques based on XML-Retrieval. Our aim is to improve both effectiveness and efficiency; we have thus submitted search results to both the Efficiency Track and the Ad Hoc Track. In our experiments, the collection, index, and search load are split over a peer-to-peer (P2P) network to gain more efficiency in terms of load balancing when searching large-scale collections. Since the bandwidth consumption between searching peers has to be limited in order to achieve a scalable, efficient system, we exploit XML-structure to reduce the number of messages sent between peers. In spite of mainly aiming at efficiency, our search engine SPIRIX resulted in quite high precisions and made it into the top-10 systems (focused task). It ranked 7 at the Ad Hoc Track (59%) and came first in terms of precision at the Efficiency Track (both categories of topics). For the first time at INEX, a P2P system achieved an official search quality comparable with the top-10 centralized solutions!.