Noise reduction in a statistical approach to text categorization
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Similarity-Based Models of Word Cooccurrence Probabilities
Machine Learning - Special issue on natural language learning
A fuzzy decision strategy for topic identification and dynamic selection of language models
Signal Processing - Special issue on fuzzy logic in signal processing
An information-theoretic approach to automatic query expansion
ACM Transactions on Information Systems (TOIS)
Spoken Dialogues with Computers
Spoken Dialogues with Computers
Information Retrieval
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature weighting for co-occurrence-based classification of words
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Using Kullback-Leibler distance for text categorization
ECIR'03 Proceedings of the 25th European conference on IR research
Fast clustering algorithm for information organization
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Clustering abstracts of scientific texts using the transition point technique
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
An approach to clustering abstracts
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
IEEE Transactions on Information Theory
ITSA*: an effective iterative method for short-text clustering tasks
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I
Beyond precision@10: clustering the long tail of web search results
Proceedings of the 20th ACM international conference on Information and knowledge management
On the assessment of text corpora
NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
A general bio-inspired method to improve the short-text clustering task
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Clustering short text and its evaluation
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Keyphrase extraction through query performance prediction
Journal of Information Science
Extended information inference model for unsupervised categorization of web short texts
Journal of Information Science
An efficient Particle Swarm Optimization approach to cluster short texts
Information Sciences: an International Journal
Hi-index | 0.00 |
Clustering short length texts is a difficult task itself, but adding the narrow domain characteristic poses an additional challenge for current clustering methods. We addressed this problem with the use of a new measure of distance between documents which is based on the symmetric Kullback-Leibler distance. Although this measure is commonly used to calculate a distance between two probability distributions, we have adapted it in order to obtain a distance value between two documents. We have carried out experiments over two different narrow-domain corpora and our findings indicates that it is possible to use this measure for the addressed problem obtaining comparable results than those which use the Jaccard similarity measure.