Short text classification by detecting information path

Authors:
Shitao Zhang;Xiaoming Jin;Dou Shen;Bin Cao;Xuetao Ding;Xiaochen Zhang
Affiliations:
School of Software, Tsinghua University, Beijing, China;School of Software, Tsinghua University, Beijing, China;Baidu Corporation, Beijing, China;Microsoft Research Asia, Beijing, China;School of Software, Tsinghua University, Beijing, China;School of Software, Tsinghua University, Beijing, China
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 17
Cited 0

A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Using unlabeled data to improve text classification

Using unlabeled data to improve text classification
A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
Query enrichment for web-query classification

ACM Transactions on Information Systems (TOIS)
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
Learning to classify short and sparse text & web with hidden topics from large-scale data collections

Proceedings of the 17th international conference on World Wide Web
Curriculum learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Exploiting Wikipedia as external knowledge for document clustering

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Improving similarity measures for short segments of text

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Short text classification in twitter to improve information filtering

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A Survey on Transfer Learning

IEEE Transactions on Knowledge and Data Engineering
Short text similarity based on probabilistic topics

Knowledge and Information Systems
Transferring topical knowledge from auxiliary long texts for short text clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
Classification of short texts by deploying topical annotations

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Short text classification improved by learning multi-granularity topics

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Short text classification using very few words

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
TCSST: transfer classification of short & sparse text using external data

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Short text is becoming ubiquitous in many modern information systems. Due to the shortness and sparseness of short texts, there are less informative word co-occurrences among them, which naturally pose great difficulty for classification tasks on such data. To overcome this difficulty, this paper proposes a new way for effectively classifying the short texts. Our method is based on a key observation that there usually exists ordered subsets in short texts, which is termed ``information path'' in this work, and classification on each subset based on the classification results of some pervious subsets can yield higher overall accuracy than classifying the entire data set directly. We propose a method to detect the information path and employ it in short text classification. Different from the state-of-art methods, our method does not require any external knowledge or corpus that usually need careful fine-tuning, which makes our method easier and more robust on different data sets. Experiments on two real world data sets show the effectiveness of the proposed method and its superiority over the existing methods.