Data compression using dynamic Markov modelling
The Computer Journal
An introduction to Kolmogorov complexity and its applications (2nd ed.)
An introduction to Kolmogorov complexity and its applications (2nd ed.)
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Online Passive-Aggressive Algorithms
The Journal of Machine Learning Research
Spam Filtering Using Statistical Data Compression Models
The Journal of Machine Learning Research
Confidence-weighted linear classification
Proceedings of the 25th international conference on Machine learning
What is Twitter, a social network or a news media?
Proceedings of the 19th international conference on World wide web
Earthquake shakes Twitter users: real-time event detection by social sensors
Proceedings of the 19th international conference on World wide web
Short text classification in twitter to improve information filtering
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
On compression-based text classification
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Detect'11: international workshop on DETecting and Exploiting Cultural diversiTy on the social web
Proceedings of the 20th ACM international conference on Information and knowledge management
Sentiment and topic analysis on social media: a multi-task multi-label classification approach
Proceedings of the 5th Annual ACM Web Science Conference
Topic hierarchy construction for the organization of multi-source user generated contents
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Identifying purpose behind electoral tweets
Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining
Hi-index | 0.00 |
We propose a new method that uses data compression for classifying an unseen tweet as being related to an interesting topic or not. Our compression-based tweet classification method, called CTC, evaluates the compressibility of the tweet when given positive and negative examples. This enables our method to handle multilingual tweets in the same manner and to effectively utilize the word context of the tweet, which is extremely important information in the 140 character limit. Experiments with worldwide tweets assigned a single hashtag demonstrate that our method, which uses the Deflate algorithm (used in gzip) for empirical evaluations, achieved higher precision and recall rates than state-of-the-art online learning algorithms.