When a few highly relevant answers are enough

Authors:
Miro Lehtonen
Affiliations:
Department of Computer Science, University of Helsinki, Finland
Venue:
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Year:
2005

Citing 5
Cited 0

XIRQL: a query language for information retrieval in XML documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Length normalization in XML retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Configurable indexing and ranking for XML information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Block-based web search

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
EXTIRP 2004: towards heterogeneity

INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Our XML retrieval system EXTIRP was slightly modified from the 2004 version for the INEX 2005 project. For the first time, the system is now completely independent of the document type of the XML documents in the collection, which justifies the use of the term “heterogeneous” when describing our methodology. Nevertheless, the 2005 version of EXTIRP is still an incomplete system that does not include query expansion or dynamic determination of the answer size. The latter is seen as a serious limitation because of the XCG-based metrics which favour systems that can adjust the size of the answer according to its relevance to the query. We put our main focus on the CO.Focussed task of the adhoc track although runs were submitted for other tasks, as well. Perhaps because of the incompleteness of our system, the initial results bring out the characteristics of our system better than in earlier years. Even when partially stripped, EXTIRP is capable of ranking the most obvious highly relevant answers at the top ranks better than many other systems. The relatively high precision at the top ranks is achieved at the cost of losing the sight of the marginally relevant content, which shows in some exceptionally steep curves, and the rankings among other systems that sink from the top ranks at low recall levels towards the bottom ranks at higher levels of recall. Another fact supporting our observation is that regardless of the metric, our runs are ranked higher with the strict quantisation than with any other quantisation function.