Deep unsupervised feature learning for natural language processing

Authors:
Stephan Gouws
Affiliations:
Stellenbosch University, Stellenbosch, South Africa
Venue:
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop
Year:
2012

Citing 14
Cited 0

Class-based n-gram models of natural language

Computational Linguistics
Support-Vector Networks

Machine Learning
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Feature extraction through LOCOCODE

Neural Computation
Automatic labeling of semantic roles

Computational Linguistics
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A neural probabilistic language model

The Journal of Machine Learning Research
A Mathematical Theory of Communication

A Mathematical Theory of Communication
A fast learning algorithm for deep belief nets

Neural Computation
Three new graphical models for statistical language modelling

Proceedings of the 24th international conference on Machine learning
A unified architecture for natural language processing: deep neural networks with multitask learning

Proceedings of the 25th international conference on Machine learning
Learning Deep Architectures for AI

Foundations and Trends® in Machine Learning
Word representations: a simple and general method for semi-supervised learning

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Least squares quantization in PCM

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Statistical natural language processing (NLP) builds models of language based on statistical features extracted from the input text. We investigate deep learning methods for unsupervised feature learning for NLP tasks. Recent results indicate that features learned using deep learning methods are not a silver bullet and do not always lead to improved results. In this work we hypothesise that this is the result of a disjoint training protocol which results in mismatched word representations and classifiers. We also hypothesise that modelling long-range dependencies in the input and (separately) in the output layers would further improve performance. We suggest methods for overcoming these limitations, which will form part of our final thesis work.