Learning collection fusion strategies
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Exploring the similarity space
ACM SIGIR Forum
A vector space model for automatic indexing
Communications of the ACM
A probabilistic model of information retrieval: development and comparative experiments
Information Processing and Management: an International Journal
Query-based sampling of text databases
ACM Transactions on Information Systems (TOIS)
Probabilistic models of information retrieval based on measuring the divergence from randomness
ACM Transactions on Information Systems (TOIS)
A study of smoothing methods for language models applied to information retrieval
ACM Transactions on Information Systems (TOIS)
A formal study of information retrieval heuristics
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
An exploration of axiomatic approaches to information retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Evolving local and global weighting schemes in information retrieval
Information Retrieval
Semantic term matching in axiomatic approaches to information retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Distributed query sampling: a quality-conscious approach
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
An exploration of proximity measures in information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Learning in a pairwise term-term proximity framework for information retrieval
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Sources of evidence for vertical selection
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Query side evaluation: an empirical analysis of effectiveness and effort
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Measuring constraint violations in information retrieval
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
Term-weighting functions derived from various models of retrieval aim to model human notions of relevance more accurately. However, there is a lack of analysis of the sources of evidence from which important features of these term weighting schemes originate. In general, features pertaining to these term-weighting schemes can be collected from (1) the document, (2) the entire collection and (3) the query. In this work, we perform an empirical analysis to determine the increase in effectiveness as information from these three different sources becomes more accurate. First, we determine the number of documents to be indexed to accurately estimate collection-wide features to obtain near optimal effectiveness for a range of a term-weighting functions. Similarly, we determine the amount of a document and query that must be sampled to achieve near-peak effectiveness. This analysis also allows us to determine the factors that contribute most to the performance of a term-weighting function (i.e. the document, the collection or the query). We use our framework to construct a new model of weighting where we discard the 'bag of words' model and aim to retrieve documents based on the initial physical representation of a document using some basic axioms of retrieval. We show that this is a good first step towards incorporating some more interesting features into a term-weighting function