Short text classification improved by learning multi-granularity topics

Authors:
Mengen Chen;Xiaoming Jin;Dou Shen
Affiliations:
Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory for Information Science and Technology, School of Software, Tsinghua University, Beijing, China;Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory for Information Science and Technology, School of Software, Tsinghua University, Beijing, China;Buzzlabs, Bellevue WA
Venue:
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Year:
2011

Citing 12
Cited 7

Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
Query enrichment for web-query classification

ACM Transactions on Information Systems (TOIS)
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
Learning to classify short and sparse text & web with hidden topics from large-scale data collections

Proceedings of the 17th international conference on World Wide Web
Enhancing text clustering by leveraging Wikipedia semantics

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting Wikipedia as external knowledge for document clustering

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
WikiRelate! computing semantic relatedness using wikipedia

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Improving similarity measures for short segments of text

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Exploiting internal and external semantics for the clustering of short texts using world knowledge

Proceedings of the 18th ACM conference on Information and knowledge management

An approach towards dynamic assembling of learning objects

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Short text classification using very few words

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
CluChunk: clustering large scale user-generated content incorporating chunklet information

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Topic-driven reader comments summarization

Proceedings of the 21st ACM international conference on Information and knowledge management
Short text classification by detecting information path

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
An unsupervised transfer learning approach to discover topics for online reputation management

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Learning topical translation model for microblog hashtag suggestion

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Understanding the rapidly growing short text is very important. Short text is different from traditional documents in its shortness and sparsity, which hinders the application of conventional machine learning and text mining algorithms. Two major approaches have been exploited to enrich the representation of short text. One is to fetch contextual information of a short text to directly add more text; the other is to derive latent topics from existing large corpus, which are used as features to enrich the representation of short text. The latter approach is elegant and efficient in most cases. The major trend along this direction is to derive latent topics of certain granularity through well-known topic models such as latent Dirichlet allocation (LDA). However, topics of certain granularity are usually not sufficient to set up effective feature spaces. In this paper, we move forward along this direction by proposing an method to leverage topics at multiple granularity, which can model the short text more precisely. Taking short text classification as an example, we compared our proposed method with the state-of-the-art baseline over one open data set. Our method reduced the classification error by 20.25% and 16.68% respectively on two classifiers.