Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Information Processing and Management: an International Journal - Special issue: history of information science
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Query modification based on relevance back-propagation in an ad hoc environment
Information Processing and Management: an International Journal
Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
A study of parameter tuning for term frequency normalization
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A formal study of information retrieval heuristics
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
The automatic creation of literature abstracts
IBM Journal of Research and Development
Term frequency normalisation tuning for BM25 and DFR models
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Possibilistic networks for information retrieval
International Journal of Approximate Reasoning
Progress in information retrieval
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Hi-index | 0.00 |
The effectiveness of the information retrieval systems is largely dependent on term-weighting. Most current term-weighting approaches involve the use of term frequency normalization. We develop here a method to assess the potential role of the term frequency-inverse document frequency measures that are commonly used in text retrieval systems. Since automatic information retrieval systems have to deal with documents of varying sizes and terms of varying frequencies, we carried out preliminary tests to evaluate the effect of term-weighing items on the retrieval performance. With regard to the preliminary tests, we identify a novel factor (effective level of term frequency) that represents the document content based on its length and maximum term-frequency. This factor is used to find the maximum main terms within the documents and an appropriate subset of documents containing the query terms. We show that, all document terms need not be considered for ranking a document with respect to a query. Regarding the result of the experiments, the effective level of term frequency (EL) is a significant factor in retrieving relevant documents, especially in large collections. Experiments were under-taken on TREC collections to evaluate the effectiveness of our proposal.