The effect of adding relevance information in a relevance feedback environment
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A theory of term weighting based on exploratory data analysis
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Variance based classifier comparison in text catergorization (poster session)
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
An information-theoretic perspective of tf—idf measures
Information Processing and Management: an International Journal
Unitary operators on the document space
Journal of the American Society for Information Science and Technology - Mathematical, logical, and formal methods in information retrieval
Dempster-Shafer Theory for a Query-Biased Combination of Evidence on the Web
Information Retrieval
A method of cluster-based indexing of textual data
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Using contextual spelling correction to improve retrieval effectiveness in degraded text collections
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Text classification based on the bias of word frequency over categories
AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
The influence of indexing practices and weighting algorithms on document spaces
Journal of the American Society for Information Science and Technology
Authority-based keyword search in databases
ACM Transactions on Database Systems (TODS)
Supervised document classification based upon domain-specific term taxonomies
International Journal of Metadata, Semantics and Ontologies
Improving the performance of personal name disambiguation using web directories
Information Processing and Management: an International Journal
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Japanese text classification using N-gram and the maximum ratio of term frequency among categories
ASC '07 Proceedings of The Eleventh IASTED International Conference on Artificial Intelligence and Soft Computing
Word weighting based on user's browsing history
UM'03 Proceedings of the 9th international conference on User modeling
Difference-similitude matrix in text classification
FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
Detecting social spam campaigns on twitter
ACNS'12 Proceedings of the 10th international conference on Applied Cryptography and Network Security
Information-theoretic term weighting schemes for document clustering
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Hi-index | 0.00 |
The feature quantity, a quantitative representation of specificity introduced in this paper, is based on an information theoretic perspective of co-occurrence events between terms and documents. Mathematically, the feature quantity is defined as a product of probability and information, and maintains a good correspondence with the tfidf-like measures popularly used in today's IR systems. In this paper, we present a formal description of the feature quantity, as well as some illustrative examples of applying such a quantity to different types of information retrieval tasks: representative term selection and text categorization.