The feature quantity: an information theoretic perspective of Tfidf-like measures

Authors:
Akiko Aizawa
Affiliations:
National Institute of Informatics, 2-1-2 Hitotsubashi Chiyoda-ku, Tokyo 101-8430, Japan
Venue:
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2000

Citing 7
Cited 17

The effect of adding relevance information in a relevance feedback environment

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A theory of term weighting based on exploratory data analysis

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Variance based classifier comparison in text catergorization (poster session)

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchically Classifying Documents Using Very Few Words

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning

An information-theoretic perspective of tf—idf measures

Information Processing and Management: an International Journal
Unitary operators on the document space

Journal of the American Society for Information Science and Technology - Mathematical, logical, and formal methods in information retrieval
Dempster-Shafer Theory for a Query-Biased Combination of Evidence on the Web

Information Retrieval
A method of cluster-based indexing of textual data

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Using contextual spelling correction to improve retrieval effectiveness in degraded text collections

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Text classification based on the bias of word frequency over categories

AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
A comparison and semi-quantitative analysis of words and character-bigrams as features in Chinese text categorization

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
The influence of indexing practices and weighting algorithms on document spaces

Journal of the American Society for Information Science and Technology
Authority-based keyword search in databases

ACM Transactions on Database Systems (TODS)
Supervised document classification based upon domain-specific term taxonomies

International Journal of Metadata, Semantics and Ontologies
Improving the performance of personal name disambiguation using web directories

Information Processing and Management: an International Journal
Exploiting Category Information and Document Information to Improve Term Weighting for Text Categorization

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Japanese text classification using N-gram and the maximum ratio of term frequency among categories

ASC '07 Proceedings of The Eleventh IASTED International Conference on Artificial Intelligence and Soft Computing
Word weighting based on user's browsing history

UM'03 Proceedings of the 9th international conference on User modeling
Difference-similitude matrix in text classification

FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
Detecting social spam campaigns on twitter

ACNS'12 Proceedings of the 10th international conference on Applied Cryptography and Network Security
Information-theoretic term weighting schemes for document clustering

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

The feature quantity, a quantitative representation of specificity introduced in this paper, is based on an information theoretic perspective of co-occurrence events between terms and documents. Mathematically, the feature quantity is defined as a product of probability and information, and maintains a good correspondence with the tfidf-like measures popularly used in today's IR systems. In this paper, we present a formal description of the feature quantity, as well as some illustrative examples of applying such a quantity to different types of information retrieval tasks: representative term selection and text categorization.