Accelerated Gradient Method for Multi-task Sparse Learning Problem

Authors:
Xi Chen;Weike Pan;James T. Kwok;Jaime G. Carbonell
Affiliations:
-;-;-;-
Venue:
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Year:
2009

Citing 0
Cited 8

Anomaly localization for network data streams with graph joint sparse PCA

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Content based social behavior prediction: a multi-task learning approach

Proceedings of the 20th ACM international conference on Information and knowledge management
Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks

ACM Transactions on Knowledge Discovery from Data (TKDD)
Efficient Euclidean projections via Piecewise Root Finding and its application in gradient projection

Neurocomputing
Robust Visual Tracking via Structured Multi-Task Sparse Learning

International Journal of Computer Vision
Learning with infinitely many features

Machine Learning
Efficient online learning for multitask feature selection

ACM Transactions on Knowledge Discovery from Data (TKDD)
Lead-lag analysis via sparse co-projection in correlated text streams

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many real world learning problems can be recast as multi-task learning problems which utilize correlations among different tasks to obtain better generalization performance than learning each task individually. The feature selection problem in multi-task setting has many applications in fields of computer vision, text classification and bio-informatics. Generally, it can be realized by solving a L-1-infinity regularized optimization problem. And the solution automatically yields the joint sparsity among different tasks. However, due to the nonsmooth nature of the L-1-infinity norm, there lacks an efficient training algorithm for solving such problem with general convex loss functions. In this paper, we propose an accelerated gradient method based on an ``optimal'' first order black-box method named after Nesterov and provide the convergence rate for smooth convex loss functions. For nonsmooth convex loss functions, such as hinge loss, our method still has fast convergence rate empirically. Moreover, by exploiting the structure of the L-1-infinity ball, we solve the black-box oracle in Nesterov's method by a simple sorting scheme. Our method is suitable for large-scale multi-task learning problem since it only utilizes the first order information and is very easy to implement. Experimental results show that our method significantly outperforms the most state-of-the-art methods in both convergence speed and learning accuracy.