Unsupervised learning of feature hierarchies

  • Authors:
  • Yann Lecun;Marc'Aurelio Ranzato

  • Affiliations:
  • New York University;New York University

  • Venue:
  • Unsupervised learning of feature hierarchies
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The applicability of machine learning methods is often limited by the amount of available labeled data, and by the ability (or inability) of the designer to produce good internal representations and good similarity measures for the input data vectors. The aim of this thesis is to alleviate these two limitations by proposing algorithms to learn good internal representations, and invariant feature hierarchies from unlabeled data. These methods go beyond traditional supervised learning algorithms, and rely on unsupervised, and semi-supervised learning. In particular, this work focuses on “deep learning” methods, a set of techniques and principles to train hierarchical models. Hierarchical models produce feature hierarchies that can capture complex non-linear dependencies among the observed data variables in a concise and efficient manner. After training, these models can be employed in real-time systems because they compute the representation by a very fast forward propagation of the input through a sequence of non-linear transformations. When the paucity of labeled data does not allow the use of traditional supervised algorithms, each layer of the hierarchy can be trained in sequence starting at the bottom by using unsupervised or semi-supervised algorithms. Once each layer has been trained, the whole system can be fine-tuned in an end-to-end fashion. We propose several unsupervised algorithms that can be used as building block to train such feature hierarchies. We investigate algorithms that produce sparse overcomplete representations and features that are invariant to known and learned transformations. These algorithms are designed using the Energy-Based Model framework and gradient-based optimization techniques that scale well on large datasets. The principle underlying these algorithms is to learn representations that are at the same time sparse, able to reconstruct the observation, and directly predictable by some learned mapping that can be used for fast inference in test time. With the general principles at the foundation of these algorithms, we validate these models on a variety of tasks, from visual object recognition to text document classification and retrieval.