A study of the overlap among document representations

Authors:
Padima Das-Gupta;Jeffrey Katzer
Affiliations:
Syracuse University, Syracuse, N.Y.;Syracuse University, Syracuse, N.Y.
Venue:
SIGIR '83 Proceedings of the 6th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
1983

Citing 0
Cited 8

A Fusion Approach to XML Structured Document Retrieval

Information Retrieval
ProbFuse: a probabilistic approach to data fusion

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Probability-based fusion of information retrieval result sets

Artificial Intelligence Review
From "Identical" to "Similar": Fusing Retrieved Lists Based on Inter-document Similarities

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Properties of optimally weighted data fusion in CBMIR

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Cluster-based fusion of retrieved lists

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Re-ranking search results using an additional retrieved list

Information Retrieval
From "identical" to "similar": fusing retrieved lists based on inter-document similarities

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most previous investigations comparing the performance of different representations have used recall and precision as performance measures. However, there is evidence to show that these measures are insensitive to an important difference between representations. To explain, two representations may perform similarly on these measures, while retrieving very different sets of documents. Equivalence of representations should be decided on the basis of similarity in performance and similarity in the documents retrieved. This study compared the performance of four representations in the PsycAbs database. In addition, overlap between retrieved sets was also computed where overlap is the proportion of retrieved documents that are the same for pairs of document representations. Results indicate that for any two representations considered, performance values differed slightly while overlap scores were also low, thus supporting the evidence that recall and precision as performance measures mask differences between the sets of retrieved documents. Results are interpreted to propose an optimal ordering of the representations and to examine the contribution of each representation given this combination.