Examining the information retrieval process from an inductive perspective

Authors:
Ronan Cummins;Mounia Lalmas;Colm O'Riordan
Affiliations:
University of Glasgow, Glasgow, United Kingdom;University of Glasgow, Glasgow, United Kingdom;National University of Ireland, Galway, Galway, United Kingdom
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 17
Cited 0

Learning collection fusion strategies

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Exploring the similarity space

ACM SIGIR Forum
A vector space model for automatic indexing

Communications of the ACM
A probabilistic model of information retrieval: development and comparative experiments

Information Processing and Management: an International Journal
Query-based sampling of text databases

ACM Transactions on Information Systems (TOIS)
Probabilistic models of information retrieval based on measuring the divergence from randomness

ACM Transactions on Information Systems (TOIS)
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
A formal study of information retrieval heuristics

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
An exploration of axiomatic approaches to information retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Evolving local and global weighting schemes in information retrieval

Information Retrieval
Semantic term matching in axiomatic approaches to information retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Distributed query sampling: a quality-conscious approach

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
An exploration of proximity measures in information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Learning in a pairwise term-term proximity framework for information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Sources of evidence for vertical selection

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Query side evaluation: an empirical analysis of effectiveness and effort

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Measuring constraint violations in information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Term-weighting functions derived from various models of retrieval aim to model human notions of relevance more accurately. However, there is a lack of analysis of the sources of evidence from which important features of these term weighting schemes originate. In general, features pertaining to these term-weighting schemes can be collected from (1) the document, (2) the entire collection and (3) the query. In this work, we perform an empirical analysis to determine the increase in effectiveness as information from these three different sources becomes more accurate. First, we determine the number of documents to be indexed to accurately estimate collection-wide features to obtain near optimal effectiveness for a range of a term-weighting functions. Similarly, we determine the amount of a document and query that must be sampled to achieve near-peak effectiveness. This analysis also allows us to determine the factors that contribute most to the performance of a term-weighting function (i.e. the document, the collection or the query). We use our framework to construct a new model of weighting where we discard the 'bag of words' model and aim to retrieve documents based on the initial physical representation of a document using some basic axioms of retrieval. We show that this is a good first step towards incorporating some more interesting features into a term-weighting function