Elements of information theory
Elements of information theory
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The effect of adding relevance information in a relevance feedback environment
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Similarity-Based Models of Word Cooccurrence Probabilities
Machine Learning - Special issue on natural language learning
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based language models for distributed retrieval
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A fuzzy decision strategy for topic identification and dynamic selection of language models
Signal Processing - Special issue on fuzzy logic in signal processing
An information-theoretic approach to automatic query expansion
ACM Transactions on Information Systems (TOIS)
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Spoken Dialogues with Computers
Spoken Dialogues with Computers
Probabilistic combination of text classifiers using reliability indicators: models and results
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A novel distance-based classifier built on pattern ranking
Proceedings of the 2009 ACM symposium on Applied Computing
Reducing the Plagiarism Detection Search Space on the Basis of the Kullback-Leibler Distance
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
SBotMiner: large scale search bot detection
Proceedings of the third ACM international conference on Web search and data mining
Estimation of quality of service in spelling correction using Kullback-Leibler divergence
Expert Systems with Applications: An International Journal
Coordinate model for text categorization
Transactions on edutainment V
Medical event coreference resolution using the UMLS metathesaurus and temporal reasoning
Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
Evaluating subtopic retrieval methods: Clustering versus diversification of search results
Information Processing and Management: an International Journal
Word length n-grams for text re-use detection
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Keyphrase extraction through query performance prediction
Journal of Information Science
Contextifier: automatic generation of annotated stock visualizations
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
On exploiting content and citations together to compute similarity of scientific papers
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
On combining text-based and link-based similarity measures for scientific papers
Proceedings of the 2013 Research in Adaptive and Convergent Systems
Journal of Information Science
Hi-index | 0.00 |
A system that performs text categorization aims to assign appropriate categories from a predefined classification scheme to incoming documents. These assignments might be used for varied purposes such as filtering, or retrieval. This paper introduces a new effective model for text categorization with great corpus (more or less 1 million documents). Text categorization is performed using the Kullback-Leibler distance between the probability distribution of the document to classify and the probability distribution of each category. Using the same representation of categories, experiments show a significant improvement when the above mentioned method is used. KLD method achieve substantial improvements over the tfidf performing method.