Probabilistic models of information retrieval based on measuring the divergence from randomness

Authors:
Gianni Amati;Cornelis Joost Van Rijsbergen
Affiliations:
University of Glasgow, Fondazione Ugo Bordoni, Roma, Italy;University of Glasgow, Glasgow, Scotland
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
2002

Citing 14
Cited 171

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Models for retrieval with probabilistic indexing

Information Processing and Management: an International Journal - Modeling data, information and knowledge
Experiments with a component theory of probabilistic information retrieval based on single terms as document components

ACM Transactions on Information Systems (TOIS)
N-Poisson document modelling

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of text retrieval models

The Computer Journal - Special issue on information retrieval
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
On modeling information retrieval with probabilistic inference

ACM Transactions on Information Systems (TOIS)
Overview of the second text retrieval conference (TREC-2)

TREC-2 Proceedings of the second conference on Text retrieval conference
Document length normalization

Information Processing and Management: an International Journal - Special issue: history of information science
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Computational Complexity and Probability Constructions

Journal of the ACM (JACM)
Foundations of Probabilistic and Utility-Theoretic Indexing

Journal of the ACM (JACM)
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic models of indexing and searching

SIGIR '80 Proceedings of the 3rd annual ACM conference on Research and development in information retrieval

An information-theoretic approach to automatic query expansion

ACM Transactions on Information Systems (TOIS)
Term Frequency Normalization via Pareto Distributions

Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
A study of parameter tuning for term frequency normalization

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A formal study of information retrieval heuristics

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Length normalization in XML retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
An effective approach to document retrieval via utilizing WordNet and recognizing phrases

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Usefulness of hyperlink structure for query-biased topic distillation

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Distribution of relevant documents in domain-level aggregates for topic distillation

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
A multi-system analysis of document and term selection for blind feedback

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Dempster-Shafer Theory for a Query-Biased Combination of Evidence on the Web

Information Retrieval
The Importance of Length Normalization for XML Retrieval

Information Retrieval
A study of the dirichlet priors for term frequency normalisation

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A Markov random field model for term dependencies

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Gravitation-based model for information retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Comparative study of monolingual and multilingual search models for use with asian languages

ACM Transactions on Asian Language Information Processing (TALIP)
A decision mechanism for the selective combination of evidence in topic distillation

Information Retrieval
A goodness of fit test approach in information retrieval

Information Retrieval
Light stemming approaches for the French, Portuguese, German and Hungarian languages

Proceedings of the 2006 ACM symposium on Applied computing
The role of knowledge in conceptual retrieval: a study in the domain of clinical medicine

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A parallel derivation of probabilistic information retrieval models

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Voting for candidates: adapting data fusion techniques for an expert search task

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Query performance prediction

Information Systems
Combining fields for query expansion and adaptive query expansion

Information Processing and Management: an International Journal
Why do successful search systems fail for some topics

Proceedings of the 2007 ACM symposium on Applied computing
On setting the hyper-parameters of term frequency normalization for information retrieval

ACM Transactions on Information Systems (TOIS)
A syntactically-based query reformulation technique for information retrieval

Information Processing and Management: an International Journal
Searching strategies for the Hungarian language

Information Processing and Management: an International Journal
Automatic feature selection in the markov random field model for information retrieval

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Parameter sensitivity in the probabilistic model for ad-hoc retrieval

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Domain knowledge conceptual inter-media indexing: application to multilingual multimedia medical reports

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Searching in Medline: Query expansion and manual indexing evaluation

Information Processing and Management: an International Journal
An analysis on document length retrieval trends in language modeling smoothing

Information Retrieval
Interpreting TF-IDF term weights as making relevance decisions

ACM Transactions on Information Systems (TOIS)
Assessing multivariate Bernoulli models for information retrieval

ACM Transactions on Information Systems (TOIS)
TF-IDF uncovered: a study of theories and probabilities

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Text collections for FIRE

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Extended probabilistic HAL with close temporal association for psychiatric query document retrieval

ACM Transactions on Information Systems (TOIS)
Spoken Document Retrieval Based on Approximated Sequence Alignment

TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Text Retrieval through Corrupted Queries

IBERAMIA '08 Proceedings of the 11th Ibero-American conference on AI: Advances in Artificial Intelligence
Corrupted queries in Spanish text retrieval: error correction vs. N-Grams

Proceedings of the 2nd ACM workshop on Improving non english web searching
Learning semantic relatedness from term discrimination information

Expert Systems with Applications: An International Journal
Statistical Language Models for Information Retrieval A Critical Review

Foundations and Trends in Information Retrieval
Risk-Aware Information Retrieval

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Document relevance assessment via term distribution analysis using fourier series expansion

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Using External Knowledge to Solve Multi-Dimensional Queries

Proceedings of the 2006 conference on Leading the Web in Concurrent Engineering: Next Generation Concurrent Engineering
Risky business: modeling and exploiting uncertainty in information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Indexing and stemming approaches for the Czech language

Information Processing and Management: an International Journal
A two-stage approach to retrieving answers for how-to questions

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Semi-parametric and Non-parametric Term Weighting for Information Retrieval

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Bridging Language Modeling and Divergence from Randomness Models: A Log-Logistic Model for IR

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Modeling the Score Distributions of Relevant and Non-relevant Documents

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Written Texts as Statistical Mechanical Problem

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Building a framework for the probability ranking principle by a family of expected weighted rank

ACM Transactions on Information Systems (TOIS)
Exploring fusion in a spontaneous speech retrieval task

SSCS '09 Proceedings of the third workshop on Searching spontaneous conversational speech
Retrieval constraints and word frequency distributions: a log-logistic model for IR

Proceedings of the 18th ACM conference on Information and knowledge management
Probabilistic static pruning of inverted files

ACM Transactions on Information Systems (TOIS)
When stopword lists make the difference

Journal of the American Society for Information Science and Technology
Indexing and searching strategies for the Russian language

Journal of the American Society for Information Science and Technology
Aggregative query generation

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
The Probabilistic Relevance Framework: BM25 and Beyond

Foundations and Trends in Information Retrieval
Distances and weighting schemes for bag of visual words image retrieval

Proceedings of the international conference on Multimedia information retrieval
Multinomial randomness models for retrieval with document fields

ECIR'07 Proceedings of the 29th European conference on IR research
Relevance feedback using weight propagation compared with information-theoretic query expansion

ECIR'07 Proceedings of the 29th European conference on IR research
Highly frequent terms and sentence retrieval

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
The BNB distribution for text modeling

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
A statistical view of binned retrieval models

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Fusion of retrieval models at CLEF 2008 ad hoc Persian track

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
GIR with language modeling and DFR using Terrier

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
On statistical analysis and optimization of information retrieval effectiveness metrics

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Information-based models for ad hoc IR

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
The FIRE 2008 Evaluation Exercise

ACM Transactions on Asian Language Information Processing (TALIP)
Comparative Study of Indexing and Search Strategies for the Hindi, Marathi, and Bengali Languages

ACM Transactions on Asian Language Information Processing (TALIP)
Incorporating multiple genomic features with the utilization of interacting domain patterns to improve the prediction of protein-protein interactions

Information Sciences: an International Journal
Examining the information retrieval process from an inductive perspective

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Reverted indexing for feedback and expansion

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Diversity promotion: is reordering top-ranked documents sufficient?

CLEF'09 Proceedings of the 10th international conference on Cross-language evaluation forum: multimedia experiments
Biomedical information retrieval: the BioTracer approach

ITBAM'10 Proceedings of the First international conference on Information technology in bio- and medical informatics
Ad hoc retrieval with the Persian language

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Using wordnet relations and semantic classes in information retrieval tasks

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Managing misspelled queries in IR applications

Information Processing and Management: an International Journal
Effective large scale text retrieval via learning risk-minimization and dependency-embedded model

MMM'11 Proceedings of the 17th international conference on Advances in multimedia modeling - Volume Part II
Statistical language models for information retrieval chengxiang zhai university of illinois at urbana champaign morgan & claypool (synthesis lectures on human language technologies, edited by graeme hirst), volume 1, 2008; xiii+125 pp, princeton, nj; paperbound, isbn 978-1-59829-590-0, $40.00; ebook, isbn 978-1-59829-591-7, $30.00 or by subscription

Computational Linguistics
Retrieval constraints and word frequency distributions a log-logistic model for IR

Information Retrieval
Variational bayes for modeling score distributions

Information Retrieval
Diagnostic Evaluation of Information Retrieval Models

ACM Transactions on Information Systems (TOIS)
A Fast Corpus-Based Stemmer

ACM Transactions on Asian Language Information Processing (TALIP)
Enhancing ad-hoc relevance weighting using probability density estimation

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Parameterized concept weighting in verbose queries

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A novel corpus-based stemming algorithm using co-occurrence statistics

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Ad hoc IR: not much room for improvement

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
The static absorbing model for the web

Journal of Web Engineering
Comparative information retrieval evaluation for scanned documents

Proceedings of the 15th WSEAS international conference on Computers
Tackling content spamming with a term weighting scheme

Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
Efficient keyword extraction for meaningful document perception

Proceedings of the 11th ACM symposium on Document engineering
GRAS: An effective and efficient stemming algorithm for information retrieval

ACM Transactions on Information Systems (TOIS)
Promoting divergent terms in the estimation of relevance models

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Improved stable retrieval in noisy collections

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
University of Otago at INEX 2010

INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
Sentiment analysis with a multilingual pipeline

WISE'11 Proceedings of the 12th international conference on Web information system engineering
Cross-lingual text fragment alignment using divergence from randomness

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Supporting biomedical information retrieval: the bioTracer approach

Transactions on large-scale data- and knowledge-centered systems IV
Lower-bounding term frequency normalization

Proceedings of the 20th ACM international conference on Information and knowledge management
Relevance weighting using within-document term statistics

Proceedings of the 20th ACM international conference on Information and knowledge management
TOPSIG: topology preserving document signatures

Proceedings of the 20th ACM international conference on Information and knowledge management
Adaptive term frequency normalization for BM25

Proceedings of the 20th ACM international conference on Information and knowledge management
Location-based context retrieval and filtering

LoCA'06 Proceedings of the Second international conference on Location- and Context-Awareness
Monolingual, bilingual, and GIRT information retrieval at CLEF-2005

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
CSUSM experiments in GeoCLEF2005: monolingual and bilingual tasks

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Statistical and comparative evaluation of various indexing and search models

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Effective query formulation with multiple information sources

Proceedings of the fifth ACM international conference on Web search and data mining
Reducing question answering input data using named entity recognition

TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue
Term frequency normalisation tuning for BM25 and DFR models

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
X-IOTA: an open XML framework for IR experimentation

AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology
Using surface-syntactic parser and deviation from randomness

CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
Data fusion for effective european monolingual information retrieval

CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
The university of glasgow at CLEF 2004: French monolingual information retrieval with terrier

CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
Latent argumentative pruning for compact MEDLINE indexing

AIME'05 Proceedings of the 10th conference on Artificial Intelligence in Medicine
Merging XML indices

INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval
Using evidences based on natural language to drive the process of fusing multimodal sources

NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
Feature subspace selection for efficient video retrieval

MMM'10 Proceedings of the 16th international conference on Advances in Multimedia Modeling
Vocabulary filtering for term weighting in archived question search

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Frequentist and bayesian approach to information retrieval

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Information retrieval strategies for digitized handwritten medieval documents

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Opinion-based entity ranking

Information Retrieval
Query performance prediction based on ranking list dispersion

FDIA'09 Proceedings of the Third BCS-IRSG conference on Future Directions in Information Access
A log-logistic model-based interpretation of TF normalization of BM25

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
An information-based cross-language information retrieval model

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Survey: An overview on XML similarity: Background, current trends and future directions

Computer Science Review
Improving search via personalized query expansion using social media

Information Retrieval
Combining relevancy and methodological quality into a single ranking for evidence-based medicine

Information Sciences: an International Journal
Information Retrieval on the Blogosphere

Foundations and Trends in Information Retrieval
Modeling higher-order term dependencies in information retrieval using query hypergraphs

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
IR models: foundations and relationships

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Translation techniques in cross-language information retrieval

ACM Computing Surveys (CSUR)
Passage retrieval vs. document retrieval in the CLEF 2006 ad hoc monolingual tasks with the IR-n system

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
A first approach to CLIR using character n-grams alignment

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Experiments with monolingual, bilingual, and robust retrieval

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Experiments for the cross language speech retrieval task at CLEF 2006

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
GIR with geographic query expansion

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Monolingual and bilingual experiments in GeoCLEF2006

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Character N-grams translation in cross-language information retrieval

NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
A constraint to automatically regulate document-length normalisation

Proceedings of the 21st ACM international conference on Information and knowledge management
Concavity in IR models

Proceedings of the 21st ACM international conference on Information and knowledge management
Web search personalization using social data

TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Effective retrieval model for entity with multi-valued attributes: BM25MF and beyond

EKAW'12 Proceedings of the 18th international conference on Knowledge Engineering and Knowledge Management
Salton award lecture information retrieval as engineering science

ACM SIGIR Forum
DTD based costs for tree-edit distance in structured information retrieval

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Aggregating evidence from hospital departments to improve medical records search

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Information-theoretic term weighting schemes for document clustering

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Pseudo test collections for training and tuning microblog rankers

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A novel TF-IDF weighting scheme for effective ranking

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Composition of TF normalizations: new insights on scoring functions for ad hoc IR

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Merging words and concepts for medical articles retrieval

Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Estimating structural relevance of XML elements through language model

Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Textual Similarity with a Bag-of-Embedded-Words Model

Proceedings of the 2013 Conference on the Theory of Information Retrieval
A New Probabilistic Ranking Model

Proceedings of the 2013 Conference on the Theory of Information Retrieval
Revisiting Exhaustivity and Specificity Using Propositional Logic and Lattice Theory

Proceedings of the 2013 Conference on the Theory of Information Retrieval
IR Models: Foundations and Relationships

Proceedings of the 2013 Conference on the Theory of Information Retrieval
A novel neighborhood based document smoothing model for information retrieval

Information Retrieval
Effective measures for inter-document similarity

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Transferring knowledge with source selection to learn IR functions on unlabeled collections

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Graph-of-word and TW-IDF: new approach to ad hoc IR

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Collaborative pseudo-relevance feedback

Expert Systems with Applications: An International Journal
On the modelling of ranking algorithms in probabilistic datalog

Proceedings of the 7th International Workshop on Ranking in Databases
Quality biased thread retrieval using the voting model

Proceedings of the 18th Australasian Document Computing Symposium
A text scanning mechanism simulating human reading process

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Can predicate-argument structures be used for contextual opinion retrieval from blogs?

World Wide Web
Document Score Distribution Models for Query Performance Inference and Prediction

ACM Transactions on Information Systems (TOIS)
Bridging memory-based collaborative filtering and text retrieval

Information Retrieval
Automatic faceted navigation

Future Generation Computer Systems
A nonparametric term weighting method for information retrieval based on measuring the divergence from independence

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive term-weighting models by measuring the divergence of the actual term distribution from that obtained under a random process. Among the random processes we study the binomial distribution and Bose--Einstein statistics. We define two types of term frequency normalization for tuning term weights in the document--query matching process. The first normalization assumes that documents have the same length and measures the information gain with the observed term once it has been accepted as a good descriptor of the observed document. The second normalization is related to the document length and to other statistics. These two normalization methods are applied to the basic models in succession to obtain weighting formulae. Results show that our framework produces different nonparametric models forming baseline alternatives to the standard tf-idf model.