Using Kullback-Leibler distance for text categorization

Authors:
Brigitte Bigi
Affiliations:
CLIPS-IMAG Laboratory, UMR CNRS, Grenoble cedex 9, France
Venue:
ECIR'03 Proceedings of the 25th European conference on IR research
Year:
2003

Citing 15
Cited 15

Elements of information theory

Elements of information theory
Expert network: effective and efficient learning from human decisions in text categorization and retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The effect of adding relevance information in a relevance feedback environment

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Similarity-Based Models of Word Cooccurrence Probabilities

Machine Learning - Special issue on natural language learning
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based language models for distributed retrieval

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A fuzzy decision strategy for topic identification and dynamic selection of language models

Signal Processing - Special issue on fuzzy logic in signal processing
An information-theoretic approach to automatic query expansion

ACM Transactions on Information Systems (TOIS)
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Spoken Dialogues with Computers

Spoken Dialogues with Computers
Probabilistic combination of text classifiers using reliability indicators: models and results

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?

Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning

A novel distance-based classifier built on pattern ranking

Proceedings of the 2009 ACM symposium on Applied Computing
Reducing the Plagiarism Detection Search Space on the Basis of the Kullback-Leibler Distance

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
SBotMiner: large scale search bot detection

Proceedings of the third ACM international conference on Web search and data mining
Estimation of quality of service in spelling correction using Kullback-Leibler divergence

Expert Systems with Applications: An International Journal
Coordinate model for text categorization

Transactions on edutainment V
Medical event coreference resolution using the UMLS metathesaurus and temporal reasoning

Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
Evaluating subtopic retrieval methods: Clustering versus diversification of search results

Information Processing and Management: an International Journal
Word length n-grams for text re-use detection

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Exploring semi-supervised coreference resolution of medical concepts using semantic and temporal features

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Keyphrase extraction through query performance prediction

Journal of Information Science
Contextifier: automatic generation of annotated stock visualizations

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
On exploiting content and citations together to compute similarity of scientific papers

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
On combining text-based and link-based similarity measures for scientific papers

Proceedings of the 2013 Research in Adaptive and Convergent Systems
The power of words: A content analytical approach examining whether central bank speeches become financial news

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

A system that performs text categorization aims to assign appropriate categories from a predefined classification scheme to incoming documents. These assignments might be used for varied purposes such as filtering, or retrieval. This paper introduces a new effective model for text categorization with great corpus (more or less 1 million documents). Text categorization is performed using the Kullback-Leibler distance between the probability distribution of the document to classify and the probability distribution of each category. Using the same representation of categories, experiments show a significant improvement when the above mentioned method is used. KLD method achieve substantial improvements over the tfidf performing method.