Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Elements of information theory
Elements of information theory
Automatic Information Organization and Retrieval.
Automatic Information Organization and Retrieval.
A comparison of document, sentence, and term event spaces
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Context-Specific frequencies and discriminativeness for the retrieval of structured documents
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Hi-index | 0.00 |
TF IDF has been widely used as a term weighting schemes in today's information retrieval systems. However, computation time and cost have become major concerns for its application. This study investigated the similarities and differences between IDF distributions based on the global collection and on different samples and tested the stability of the IDF measure across collections. A more efficient algorithm based on random samples generated a good approximation to the IDF computed over the entire collection, but with less computation overhead. This practice may be particularly informative and helpful for analysis on large database or dynamic environment like the Web.