Extending the bioinspired hierarchical temporal memory paradigm for sign language recognition

  • Authors:
  • David Rozado;Francisco B. Rodriguez;Pablo Varona

  • Affiliations:
  • GNB group at Escuela Politécnica Superior, Calle Francisco Tomás y Valiente, 11, Universidad Autónoma de Madrid, Madrid 28049, Spain;GNB group at Escuela Politécnica Superior, Calle Francisco Tomás y Valiente, 11, Universidad Autónoma de Madrid, Madrid 28049, Spain;GNB group at Escuela Politécnica Superior, Calle Francisco Tomás y Valiente, 11, Universidad Autónoma de Madrid, Madrid 28049, Spain

  • Venue:
  • Neurocomputing
  • Year:
  • 2012

Quantified Score

Hi-index 0.02

Visualization

Abstract

Sign language recognition, SLR, using spatial positions and arrangements of the hands over time is a challenging multi-variable time series recognition problem with several potential applications. Here we explore, for SLR purposes, a hierarchically connected network of nodes based on a Bayesian-like paradigm known as hierarchical temporal memory, HTM, that models neocortical principles of organization and information coding. HTM is a broad paradigm for pattern recognition, control, attention and forward prediction that exploits the hierarchy in time and space existing in the physical world during both learning and inference. In this work we focus on HTM capabilities for pattern recognition. We extend the traditional HTM paradigm with an original top node in order to improve HTMs performance in problems where instances unfold over time. The extended top node stores and compares sequences of spatio-temporally codified inputs to handle the temporal evolution of instances in sign language. Sequence comparison is carried out using the Needleman-Wunsch algorithm for sequence alignment that employs dynamic programming. We compare the performance of the extended HTM with traditional HTMs and machine learning algorithms routinely used in the literature for SLR. The extended HTM improves performance of traditional HTM for SLR, reaching 91% recognition accuracy for a data set of 95 categories of Australian sign language. When sufficient training instances are available, the extended HTM matches or outperforms state of the art methods for SLR such as Hidden Markov Models or Metafeatures T-Classes without the usage of a language model, nor pre-processing of sensor data. The extended HTM employs relatively small feature vectors in comparison to methods in the literature. Our method learns the spatio-temporal data structures and transitions that occur in the data without depending on manually predefined features to be searched for and works well in real time. These results suggest that the extended HTM approach is a valid bioinspired alternative to existing SLR engines and that it can be successfully applied to other machine learning tasks whose input instances also unfold over time.