Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Information Processing and Management: an International Journal - Special issue: history of information science
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Query modification based on relevance back-propagation in an ad hoc environment
Information Processing and Management: an International Journal
Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
An Algorithmic Theory of Learning: Robust Concepts and Random Projection
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
A study of parameter tuning for term frequency normalization
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A formal study of information retrieval heuristics
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Term frequency normalisation tuning for BM25 and DFR models
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Hi-index | 0.00 |
As the volume of information increases, effective information retrieval methods become more essential to deal with the growth of information. Present document develops a new method to assess the potential role of the term frequency-inverse document frequency measures that are commonly used in text retrieval systems by the vector space model. We carried out preliminary tests to know the effect of term-weighing items on the retrieval performance in a basic scheme of vector space model. With regard to the preliminary tests, we identify a novel factor (effective level of term frequency) that represents the document content based on its length and maximum term-frequency. This factor is used to find the maximum principal terms within the documents and an appropriate subset of documents containing the query terms. Our proposed method (Top-Term Ranking) uses a reduced indexing view of the original terms, where only the principal terms of each document are considered for weighting. Regarding the result of our experiments on TREC collections, the effective level of term frequency (EL) is a significant factor in retrieving relevant documents, especially in large collections. The interest of the Top-Term Ranking method is to increase the performance of the large-scale information retrieval systems more than the common vector space methods.