Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Machine Learning
The Journal of Machine Learning Research
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
An Empirical Study of Learning from Imbalanced Data Using Random Forest
ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Introduction to Information Retrieval
Introduction to Information Retrieval
Combining Super-Structuring and Abstraction on Sequence Classification
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Earthquake shakes Twitter users: real-time event detection by social sensors
Proceedings of the 19th international conference on World wide web
On smoothing and inference for topic models
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Short text classification in twitter to improve information filtering
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
The feature selection problem: traditional methods and a new algorithm
AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Sentiment in short strength detection informal text
Journal of the American Society for Information Science and Technology
Word co-occurrence features for text classification
Information Systems
Estimating continuous distributions in Bayesian classifiers
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Automatic tag recommendation for metadata annotation using probabilistic topic modeling
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Hi-index | 0.00 |
Social media is emerging as a powerful source of communication, information dissemination and mining. Being colloquial and ubiquitous in nature makes it easier for users to express their opinions and preferences in a seamless, dynamic manner. Epidemic surveillance systems that utilize social media to detect the emergence of diseases have been proposed in the literature. These systems mostly employ traditional document classification techniques that represent a document with a bag of N-grams. However, such techniques are not optimal for social media where sparsity and noise are norms. The authors address the limitations posed by the traditional N-gram based methods and propose to use features that represent different semantic aspects of the data in combination with ensemble machine learning techniques to identify health-related messages in a heterogenous pool of social media data. Furthermore, the results reveal significant improvement in identifying health related social media content which can be critical in the emergence of a novel, unknown disease epidemic.