Collaborative online learning of user generated content

Authors:
Guangxia Li;Kuiyu Chang;Steven C.H. Hoi;Wenting Liu;Ramesh Jain
Affiliations:
Nanyang Technological University, Singapore, Singapore;Nanyang Technological University, Singapore, Singapore;Nanyang Technological University, Singapore, Singapore;Nanyang Technological University, Singapore, Singapore;University of California, Irvine, CA, USA
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 8
Cited 1

Large Margin Classification Using the Perceptron Algorithm

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Ultraconservative online algorithms for multiclass problems

The Journal of Machine Learning Research
Task clustering and gating for bayesian multitask learning

The Journal of Machine Learning Research
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Learning Multiple Tasks with Kernel Methods

The Journal of Machine Learning Research
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

The Journal of Machine Learning Research
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research

Online feature selection for mining big data

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the problem of online classification of user generated content, with the goal of efficiently learning to categorize content generated by individual user. This problem is challenging due to several reasons. First, the huge amount of user generated content demands a highly efficient and scalable classification solution. Second, the categories are typically highly imbalanced, i.e., the number of samples from a particular useful class could be far and few between compared to some others (majority class). In some applications like spam detection, identification of the minority class often has significantly greater value than that of the majority class. Last but not least, when learning a classification model from a group of users, there is a dilemma: A single classification model trained on the entire corpus may fail to capture personalized characteristics such as language and writing styles unique to each user. On the other hand, a personalized model dedicated to each user may be inaccurate due to the scarcity of training data, especially at the very beginning; when users have written just a few articles. To overcome these challenges, we propose learning a global model over all users' data, which is then leveraged to continuously refine the individual models through a collaborative online learning approach. The class imbalance problem is addressed via a cost-sensitive learning approach. Experimental results show that our method is effective and scalable for timely classification of user generated content.