A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Distributional term representations: an experimental comparison
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Using bag-of-concepts to improve the performance of support vector machines in text categorization
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Altering document term vectors for classification: ontologies as expectations of co-occurrence
Proceedings of the 16th international conference on World Wide Web
Proceedings of the 17th international conference on World Wide Web
Improving naive Bayes text classifier using smoothing methods
ECIR'07 Proceedings of the 29th European conference on IR research
On the relative hardness of clustering corpora
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Evaluation of internal validity measures in short-text corpora
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Research on Short Text Classification Algorithm Based on Statistics and Rules
ISECS '10 Proceedings of the 2010 Third International Symposium on Electronic Commerce and Security
A New Model for Chinese Short-text Classification Considering Feature Extension
AICI '10 Proceedings of the 2010 International Conference on Artificial Intelligence and Computational Intelligence - Volume 02
A Hidden Topic-Based Framework toward Building Applications with Short Web Documents
IEEE Transactions on Knowledge and Data Engineering
A Self-enriching Methodology for Clustering Narrow Domain Short Texts
The Computer Journal
Short-Text classification based on ICA and LSA
ISNN'06 Proceedings of the Third international conference on Advnaces in Neural Networks - Volume Part II
Multimodal indexing based on semantic cohesion for image retrieval
Information Retrieval
A general bio-inspired method to improve the short-text clustering task
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
A document is known by the company it keeps: neighborhood consensus for short text categorization
Language Resources and Evaluation
Hi-index | 0.00 |
Everyday, millions of short-texts are generated for which effective tools for organization and retrieval are required. Because of the tiny length of these documents and of their extremely sparse representations, the direct application of standard text categorization methods is not effective. In this work we propose using distributional term representations (DTRs) for short-text categorization. DTRs represent terms by means of contextual information, given by document occurrence and term co-occurrence statistics. Therefore, they allow us to develop enriched document representations that help to overcome, to some extent, the small-length and high-sparsity issues. We report experimental results in three challenging collections, using a variety of classification methods. These results show that the use of DTRs is beneficial for improving the classification performance of classifiers in short-text categorization.