Achieving high precisions with peer-to-peer is possible!

Authors:
Judith Winter;Gerold Kühne
Affiliations:
University of Applied Science, Frankfurt, Germany;University of Applied Science, Frankfurt, Germany
Venue:
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Year:
2009

Citing 9
Cited 0

Looking up data in P2P systems

Communications of the ACM
Chord: a scalable peer-to-peer lookup protocol for internet applications

IEEE/ACM Transactions on Networking (TON)
Adding Flexibility to Structure Similarity Queries on XML Data

FQAS '02 Proceedings of the 5th International Conference on Flexible Query Answering Systems
Searching XML documents via XML fragments

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Information Retrieval Techniques for Peer-to-Peer Networks

Computing in Science and Engineering
Simple BM25 extension to multiple weighted fields

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Peer-to-Peer Systems and Applications (Lecture Notes in Computer Science)

Peer-to-Peer Systems and Applications (Lecture Notes in Computer Science)
An approach to XML path matching

Proceedings of the 9th annual ACM international workshop on Web information and data management
Aiming for Efficiency by Detecting Structural Similarity

Advances in Focused Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Until previously, centralized stand-alone solutions had no problem coping with the load of storing, indexing and searching the small test collections used for evaluating search results at INEX. However, searching the new large-scale Wikipedia collection of 2009 requires much more resources such as processing power, RAM, and index space. It is hence more important than ever to regard efficiency issues when performing XML-Retrieval tasks on such a big collection. On the other hand, the rich markup of the new collection is an opportunity to exploit the given structure and obtain a more efficient search. This paper describes our experiments using distributed search techniques based on XML-Retrieval. Our aim is to improve both effectiveness and efficiency; we have thus submitted search results to both the Efficiency Track and the Ad Hoc Track. In our experiments, the collection, index, and search load are split over a peer-to-peer (P2P) network to gain more efficiency in terms of load balancing when searching large-scale collections. Since the bandwidth consumption between searching peers has to be limited in order to achieve a scalable, efficient system, we exploit XML-structure to reduce the number of messages sent between peers. In spite of mainly aiming at efficiency, our search engine SPIRIX resulted in quite high precisions and made it into the top-10 systems (focused task). It ranked 7 at the Ad Hoc Track (59%) and came first in terms of precision at the Efficiency Track (both categories of topics). For the first time at INEX, a P2P system achieved an official search quality comparable with the top-10 centralized solutions!.