Term dependence: a basis for Luhn and Zipf models

Authors:
Robert M. Losee
Affiliations:
Univ. of North Carolina, Chapel Hill
Venue:
Journal of the American Society for Information Science and Technology
Year:
2001

Citing 10
Cited 7

Storing and retrieving word phrases

Information Processing and Management: an International Journal
Elements of information theory

Elements of information theory
Term dependence: truncating the Bahadur Lazarsfeld expansion

Information Processing and Management: an International Journal
Probabilistic dependence and logistic inference in information retrieval

Probabilistic dependence and logistic inference in information retrieval
Text windows and phrases differing by discipline, location in document, and syntactic structure

Information Processing and Management: an International Journal
Text retrieval and filtering: analytic models of performance

Text retrieval and filtering: analytic models of performance
On the law of Zipf-Mandelbrot for multi-word phrases

Journal of the American Society for Information Science
Natural language processing in support of decision-making: phrases and part-of-speech tagging

Information Processing and Management: an International Journal
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
A Mathematical Theory of Communication

A Mathematical Theory of Communication

Exploring term dependences in probabilistic information retrieval model

Information Processing and Management: an International Journal
Probabilistic information retrieval model for a dependency structured indexing system

Information Processing and Management: an International Journal
Modeling hypermedia-based communication

Information Sciences: an International Journal
Modeling hypermedia-based communication

Information Sciences: an International Journal
Lexical normalization and relationship alternatives for a term dependence model in information retrieval

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Binary lexical relations for text representation in information retrieval

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Throughput analysis for a high-performance FPGA-accelerated real-time search application

International Journal of Reconfigurable Computing - Special issue on High-Performance Reconfigurable Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

There are regularities in the statistical information provided by natural language terms about neighboring terms. We find that when phrase rank increases, moving from common to less common phrases, the value of the expected mutual information measure (EMIM) between the terms regularly decreases. Luhn's model suggests that midrange terms are the best index terms and relevance discriminators. We suggest reasons for this principle based on the empirical relationships shown here between the rank of terms within phrases and the average mutual information between terms, which we refer to as the Inverse Representation – EMIM principle. We also suggest an Inverse EMIM term weight for indexing or retrieval applications that is consistent with Luhn's distribution. An information theoretic interpretation of Zipf's Law is provided. Using the regularity noted here, we suggest that Zipf's Law is a consequence of the statistical dependencies that exist between terms, described here using information theoretic concepts.