An interpretation of index term weighting schemes based on document components

Authors:
K. L. Kwok
Affiliations:
Computer Science Department, Queens College, City University of New York, Flushing, NY
Venue:
Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
1986

Citing 5
Cited 8

A probabilistic theory of indexing and similarity measure based on cited and citing documents

Journal of the American Society for Information Science
Composite document extended retrieval: an overview

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
On Relevance, Probabilistic Indexing and Information Retrieval

Journal of the ACM (JACM)
Automatic abstracting and indexing—survey and recommendations

Communications of the ACM
Automatic Information Organization and Retrieval.

Automatic Information Organization and Retrieval.

Some considerations for using approximate optimal queries

SIGIR '87 Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
A neural network for probabilistic information retrieval

SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic document indexing from relevance feedback data

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments with a component theory of probabilistic information retrieval based on single terms as document components

ACM Transactions on Information Systems (TOIS)
Query modification and expansion in a network with adaptive architecture

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
A probabilistic learning approach for document indexing

ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
Generation and Evaluation of Indexes for Chemistry Articles

Journal of Intelligent Information Systems
An automated system that assists in the generation of document indexes

Natural Language Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

A theory of indexing is presented and is based on viewing a document as constituted of components. A component may be chosen as any run of text unit that can be: (a) judged as to its relevancy property; and (b) considered as independent within the document. By looking at the constituent components of a document in relation to the universe of all components from the collection, we have been able to apply Bayes' decision theory to derive the index term representation for the document, as well as attaching an initial probabilistic weight for each term based on a Principle of Document Self-Recovery. It turns out that different choices of document components, such as a word or a whole abstract, can lead to different term weighting schemes that have been introduced before and are based on probability considerations; specifically, Edmundson and Wyllys' term significance formula, Sparck Jones' inverse document frequency, and later modified by Croft and Harper into the 'combination match' formula. Thus, a unified interpretation of various probabilistic term weighting schemes appears possible.