Document length normalization using effective level of term frequency in large collections

Authors:
Soheila Karbasi;Mohand Boughanem
Affiliations:
IRIT-SIG, Toulouse, France;IRIT-SIG, Toulouse, France
Venue:
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Year:
2006

Citing 10
Cited 2

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Document length normalization

Information Processing and Management: an International Journal - Special issue: history of information science
Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Query modification based on relevance back-propagation in an ad hoc environment

Information Processing and Management: an International Journal
Information Retrieval

Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
A study of parameter tuning for term frequency normalization

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A formal study of information retrieval heuristics

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
The automatic creation of literature abstracts

IBM Journal of Research and Development
Term frequency normalisation tuning for BM25 and DFR models

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research

Possibilistic networks for information retrieval

International Journal of Approximate Reasoning
Progress in information retrieval

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The effectiveness of the information retrieval systems is largely dependent on term-weighting. Most current term-weighting approaches involve the use of term frequency normalization. We develop here a method to assess the potential role of the term frequency-inverse document frequency measures that are commonly used in text retrieval systems. Since automatic information retrieval systems have to deal with documents of varying sizes and terms of varying frequencies, we carried out preliminary tests to evaluate the effect of term-weighing items on the retrieval performance. With regard to the preliminary tests, we identify a novel factor (effective level of term frequency) that represents the document content based on its length and maximum term-frequency. This factor is used to find the maximum main terms within the documents and an appropriate subset of documents containing the query terms. We show that, all document terms need not be considered for ranking a document with respect to a query. Regarding the result of the experiments, the effective level of term frequency (EL) is a significant factor in retrieving relevant documents, especially in large collections. Experiments were under-taken on TREC collections to evaluate the effectiveness of our proposal.