Interpreting TF-IDF term weights as making relevance decisions

Authors:
Ho Chung Wu;Robert Wing Pong Luk;Kam Fai Wong;Kui Lam Kwok
Affiliations:
The Hong Kong Polytechnic University, Kowloon, Hong Kong;The Hong Kong Polytechnic University, Kowloon, Hong Kong;The Chinese University of Hong Kong, Hong Kong SAR, The People's Republic of China;Queens College, City University of New York, Flushing, NY
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
2008

Citing 47
Cited 26

Soft evaluation of Boolean search queries in information retrieval systems

Information Technology Research Development Applications
Decision theory: an introduction to the mathematics of rationality

Decision theory: an introduction to the mathematics of rationality
Fuzzy sets, uncertainty, and information

Fuzzy sets, uncertainty, and information
Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Models for retrieval with probabilistic indexing

Information Processing and Management: an International Journal - Modeling data, information and knowledge
Preference structure, inference and set-oriented retrieval

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
N-Poisson document modelling

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic retrieval based on staged logistic regression

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A network approach to probabilistic information retrieval

ACM Transactions on Information Systems (TOIS)
Large test collection experiments on an operational, interactive system: Okapi at TREC

TREC-2 Proceedings of the second conference on Text retrieval conference
On extending the vector space model for Boolean query processing

Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
On relevance weights with little relevance information

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
“Is this document relevant?…probably”: a survey of probabilistic models in information retrieval

ACM Computing Surveys (CSUR)
On Relevance, Probabilistic Indexing and Information Retrieval

Journal of the ACM (JACM)
Precision Weighting—An Effective Automatic Indexing Method

Journal of the ACM (JACM)
Foundations of Probabilistic and Utility-Theoretic Indexing

Journal of the ACM (JACM)
Efficient passage ranking for document databases

ACM Transactions on Information Systems (TOIS)
Improving the effectiveness of information retrieval with local context analysis

ACM Transactions on Information Systems (TOIS)
Extended Boolean information retrieval

Communications of the ACM
A vector space model for automatic indexing

Communications of the ACM
A probabilistic model of information retrieval: development and comparative experiments Part 2

Information Processing and Management: an International Journal
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval

Information Retrieval
Modern Information Retrieval

Modern Information Retrieval
Probabilistic models of information retrieval based on measuring the divergence from randomness

ACM Transactions on Information Systems (TOIS)
Passage retrieval based on language models

Proceedings of the eleventh international conference on Information and knowledge management
Local versus global link information in the Web

ACM Transactions on Information Systems (TOIS)
Probabilistic models of indexing and searching

SIGIR '80 Proceedings of the 3rd annual ACM conference on Research and development in information retrieval
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
An information-theoretic perspective of tf—idf measures

Information Processing and Management: an International Journal
A frequency-based and a poisson-based definition of the probability of being informative

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Applying Machine Learning to Text Segmentation for Information Retrieval

Information Retrieval
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
A new unified probabilistic model

Journal of the American Society for Information Science and Technology
On Event Spaces and Probabilistic Models in Information Retrieval

Information Retrieval
Why inverse document frequency?

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Relevance information: a loss of entropy but a gain for IDF?

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A retrospective study of probabilistic context-based retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A parallel derivation of probabilistic information retrieval models

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
On document relevance and lexical cohesion between query terms

Information Processing and Management: an International Journal
Term context models for information retrieval

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Probabilistic document-context based relevance feedback with limited relevance judgments

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A retrospective study of a hybrid document-context based retrieval model

Information Processing and Management: an International Journal
IDF revisited: a simple new derivation within the Robertson-Spärck Jones probabilistic model

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters

IEEE Transactions on Computers

A retrospective study of a hybrid document-context based retrieval model

Information Processing and Management: an International Journal
On event space and rank equivalence between probabilistic retrieval models

Information Retrieval
Video News Retrieval Incorporating Relevant Terms Based on Distribution of Document Frequency

PCM '08 Proceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Heuristic-Based Approach for Constructing Hierarchical Knowledge Structures

IEA/AIE '09 Proceedings of the 22nd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: Next-Generation Applied Intelligence
Building a framework for the probability ranking principle by a family of expected weighted rank

ACM Transactions on Information Systems (TOIS)
A Generative Theory of Relevance

Journal of the American Society for Information Science and Technology
Constructing tree-based knowledge structures from text corpus

Applied Intelligence
An E-collaborative learning environment based on dynamic workflow system

ITHET'10 Proceedings of the 9th international conference on Information technology based higher education and training
A method for weighting multi-valued features in content-based filtering

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part III
Where to find my next passenger

Proceedings of the 13th international conference on Ubiquitous computing
Leveraging Wikipedia concept and category information to enhance contextual advertising

Proceedings of the 20th ACM international conference on Information and knowledge management
Peer-based relay scheme of collaborative filtering for research literature

OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part I
Engineering efficient error-correcting geocoding

Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
A split-list approach for relevance feedback in information retrieval

Information Processing and Management: an International Journal
Going Through the Clouds: Search Overviews and Browsing of Movies

Proceeding of the 16th International Academic MindTrek Conference
Knowledge discovery of tourist subjective data in smartphone-based participatory sensing system by interactive growing hierarchical SOM and C4.5

International Journal of Knowledge and Web Intelligence
Social issue gives you an opportunity: discovering the personalised relevance of social issues

PKAW'12 Proceedings of the 12th Pacific Rim conference on Knowledge Management and Acquisition for Intelligent Systems
LONET: An interactive search network for intelligent lecture path generation

ACM Transactions on Intelligent Systems and Technology (TIST) - Special section on agent communication, trust in multiagent systems, intelligent tutoring and coaching systems
Efficient and Effective Aggregate Keyword Search on Relational Databases

International Journal of Data Warehousing and Mining
Towards information-theoretic K-means clustering for image indexing

Signal Processing
Content-based search overviews and exploratory browsing of movies with MovieClouds

International Journal of Advanced Media and Communication
PIXS: programmable intelligence for cross-platform socialization

Proceedings of the 5th ACM workshop on HotPlanet
Deconstructing centrality: thinking locally and ranking globally in networks

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Context Oriented Analysis of Interest Reflection of Tweeted Webpages based on Browsing Behavior

Proceedings of International Conference on Information Integration and Web-based Applications & Services
A text scanning mechanism simulating human reading process

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
An Embedded Co-AdaBoost based construction of software document relation coupled resource spaces for cyber-physical society

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A novel probabilistic retrieval model is presented. It forms a basis to interpret the TF-IDF term weights as making relevance decisions. It simulates the local relevance decision-making for every location of a document, and combines all of these “local” relevance decisions as the “document-wide” relevance decision for the document. The significance of interpreting TF-IDF in this way is the potential to: (1) establish a unifying perspective about information retrieval as relevance decision-making; and (2) develop advanced TF-IDF-related term weights for future elaborate retrieval models. Our novel retrieval model is simplified to a basic ranking formula that directly corresponds to the TF-IDF term weights. In general, we show that the term-frequency factor of the ranking formula can be rendered into different term-frequency factors of existing retrieval systems. In the basic ranking formula, the remaining quantity - log p(&rmacr;|t ∈ d) is interpreted as the probability of randomly picking a nonrelevant usage (denoted by &rmacr;) of term t. Mathematically, we show that this quantity can be approximated by the inverse document-frequency (IDF). Empirically, we show that this quantity is related to IDF, using four reference TREC ad hoc retrieval data collections.