Towards effective short text deep classification

Authors:
Xinruo Sun;Haofen Wang;Yong Yu
Affiliations:
Shanghai Jiao Tong University, Shanghai, China;Shanghai Jiao Tong University, Shanghai, China;Shanghai Jiao Tong University, Shanghai, China
Venue:
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Year:
2011

Citing 3
Cited 5

Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Short text classification in twitter to improve information filtering

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
The ECIR 2010 large scale hierarchical classification workshop

ACM SIGIR Forum

Representation models for text classification: a comparative analysis over three web document types

Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Classification of short texts by deploying topical annotations

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
TCSST: transfer classification of short & sparse text using external data

Proceedings of the 21st ACM international conference on Information and knowledge management
Supervised learning of semantic relatedness

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Probabilistic semantic similarity measurements for noisy short texts using Wikipedia entities

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, more and more short texts (e.g., ads, tweets) appear on the Web. Classifying short texts into a large taxonomy like ODP or Wikipedia category system has become an important mining task to improve the performance of many applications such as contextual advertising and topic detection for micro-blogging. In this paper, we propose a novel multi-stage classification approach to solve the problem. First, explicit semantic analysis is used to add more features for both short texts and categories. Second, we leverage information retrieval technologies to fetch the most relevant categories for an input short text from thousands of candidates. Finally, a SVM classifier is applied on only a few selected categories to return the final answer. Our experimental results show that the proposed method achieved significant improvements on classification accuracy compared with several existing state of art approaches.