Distributional term representations for short-text categorization

Authors:
Juan Manuel Cabrera;Hugo Jair Escalante;Manuel Montes-y-Gómez
Affiliations:
Department of Computational Sciences, Instituto Nacional de Astrofísica, Óptica y Electrónica, Tonantzintla, Puebla, Mexico;Department of Computational Sciences, Instituto Nacional de Astrofísica, Óptica y Electrónica, Tonantzintla, Puebla, Mexico;Department of Computational Sciences, Instituto Nacional de Astrofísica, Óptica y Electrónica, Tonantzintla, Puebla, Mexico
Venue:
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Year:
2013

Citing 19
Cited 0

A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Distributional term representations: an experimental comparison

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Using bag-of-concepts to improve the performance of support vector machines in text categorization

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Altering document term vectors for classification: ontologies as expectations of co-occurrence

Proceedings of the 16th international conference on World Wide Web
Learning to classify short and sparse text & web with hidden topics from large-scale data collections

Proceedings of the 17th international conference on World Wide Web
Improving naive Bayes text classifier using smoothing methods

ECIR'07 Proceedings of the 29th European conference on IR research
On the relative hardness of clustering corpora

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Evaluation of internal validity measures in short-text corpora

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Research on Short Text Classification Algorithm Based on Statistics and Rules

ISECS '10 Proceedings of the 2010 Third International Symposium on Electronic Commerce and Security
A New Model for Chinese Short-text Classification Considering Feature Extension

AICI '10 Proceedings of the 2010 International Conference on Artificial Intelligence and Computational Intelligence - Volume 02
A Hidden Topic-Based Framework toward Building Applications with Short Web Documents

IEEE Transactions on Knowledge and Data Engineering
A Self-enriching Methodology for Clustering Narrow Domain Short Texts

The Computer Journal
Short-Text classification based on ICA and LSA

ISNN'06 Proceedings of the Third international conference on Advnaces in Neural Networks - Volume Part II
Multimodal indexing based on semantic cohesion for image retrieval

Information Retrieval
A general bio-inspired method to improve the short-text clustering task

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
A document is known by the company it keeps: neighborhood consensus for short text categorization

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Everyday, millions of short-texts are generated for which effective tools for organization and retrieval are required. Because of the tiny length of these documents and of their extremely sparse representations, the direct application of standard text categorization methods is not effective. In this work we propose using distributional term representations (DTRs) for short-text categorization. DTRs represent terms by means of contextual information, given by document occurrence and term co-occurrence statistics. Therefore, they allow us to develop enriched document representations that help to overcome, to some extent, the small-length and high-sparsity issues. We report experimental results in three challenging collections, using a variety of classification methods. These results show that the use of DTRs is beneficial for improving the classification performance of classifiers in short-text categorization.