A unified architecture for natural language processing: deep neural networks with multitask learning

Authors:
Ronan Collobert;Jason Weston
Affiliations:
NEC Labs America, Princeton, NJ;NEC Labs America, Princeton, NJ
Venue:
Proceedings of the 25th international conference on Machine learning
Year:
2008

Citing 11
Cited 62

Multitask Learning

Machine Learning - Special issue on inductive transfer
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
A novel use of statistical parsing to extract information from text

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
The necessity of parsing for predicate argument recognition

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
The Proposition Bank: An Annotated Corpus of Semantic Roles

Computational Linguistics
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

The Journal of Machine Learning Research
Semi-Supervised Learning (Adaptive Computation and Machine Learning)

Semi-Supervised Learning (Adaptive Computation and Machine Learning)
Composition of conditional random fields for transfer learning

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Effective self-training for parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data

The Journal of Machine Learning Research
Joint parsing and semantic role labeling

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning

Curriculum learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Quadratic features and deep architectures for chunking

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
A deep learning approach to machine transliteration

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Combining labeled and unlabeled data with word-class distribution learning

Proceedings of the 18th ACM conference on Information and knowledge management
Learning Deep Architectures for AI

Foundations and Trends® in Machine Learning
On the role of lexical features in sequence labeling

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Why Does Unsupervised Pre-training Help Deep Learning?

The Journal of Machine Learning Research
A dynamically configurable coprocessor for convolutional neural networks

Proceedings of the 37th annual international symposium on Computer architecture
Multi-task learning for boosting with application to web search ranking

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
A programmable parallel accelerator for learning and classification

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Towards open-domain Semantic Role Labeling

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Word representations: a simple and general method for semi-supervised learning

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A structured model for joint learning of argument roles and predicate senses

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Evolving Static Representations for Task Transfer

The Journal of Machine Learning Research
From frequency to meaning: vector space models of semantics

Journal of Artificial Intelligence Research
Large-margin classification in infinite neural networks

Neural Computation
N-best reranking by multitask learning

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Manifold learning for the semi-supervised induction of FrameNet predicates: an empirical investigation

GEMS '10 Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics
Training continuous space language models: some practical issues

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Semi-supervised abstraction-augmented string kernel for multi-level bio-relation extraction

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Domain adaptation by constraining inter-domain variability of latent feature representation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Learning word vectors for sentiment analysis

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Temporal restricted Boltzmann machines for dependency parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Language models as representations for weakly-supervised NLP tasks

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Adapting text instead of the model: an open domain approach

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Learning discriminative projections for text similarity measures

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
On the expressive power of deep architectures

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Trends and advances in speech recognition

IBM Journal of Research and Development
Sentiment classification based on supervised latent n-gram analysis

Proceedings of the 20th ACM international conference on Information and knowledge management
Kernel Analysis of Deep Networks

The Journal of Machine Learning Research
Collaborative ranking

Proceedings of the fifth ACM international conference on Web search and data mining
Factored translation with unsupervised word clusters

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification

ACM Transactions on Architecture and Code Optimization (TACO)
Semi-supervised recursive autoencoders for predicting sentiment distributions

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
An investigation of recursive auto-associative memory in sentiment detection

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
The latent words language model

Computer Speech and Language
Evaluating distributional models of semantics for syntactically invariant inference

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Cross-lingual word clusters for direct transfer of linguistic structure

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Deep unsupervised feature learning for natural language processing

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop
Penn: using word similarities to better estimate sentence similarity

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Large-scale neuro-modeling for understanding and explaining some brain-related chaotic behavior

Simulation
Improving word representations via global context and multiple word prototypes

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Baselines and bigrams: simple, good sentiment and topic classification

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Measuring the influence of long range dependencies with neural network language models

WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
A comparison of vector-based representations for semantic composition

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Semantic compositionality through recursive matrix-vector spaces

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Sentiment classification with supervised sequence embedding

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Efficient training of graph-regularized multitask SVMs

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Free-gram phrase identification for modeling Chinese text

Information Processing Letters
Exploiting deep neural networks for detection-based speech recognition

Neurocomputing
Representing objects, relations, and sequences

Neural Computation
Textual Similarity with a Bag-of-Embedded-Words Model

Proceedings of the 2013 Conference on the Theory of Information Retrieval
Latent feature learning in social media network

Proceedings of the 21st ACM international conference on Multimedia
Using natural language to integrate, evaluate, and optimize extracted knowledge bases

Proceedings of the 2013 workshop on Automated knowledge base construction
Universal schema for entity type prediction

Proceedings of the 2013 workshop on Automated knowledge base construction
Artificial neural network-based prediction of human posture

DHM'13 Proceedings of the 4th international conference on Digital Human Modeling and Applications in Health, Safety, Ergonomics, and Risk Management: human body modeling and ergonomics - Volume Part II
Deep learning of representations: looking forward

SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Chinese-English mixed text normalization

Proceedings of the 7th ACM international conference on Web search and data mining
Multilingual joint parsing of syntactic and semantic dependencies with a latent variable model

Computational Linguistics
A semantic matching energy function for learning with multi-relational data

Machine Learning
A tour of machine learning: An AI perspective

AI Communications - ECAI 2012 Turing and Anniversary Track
Knowledge-based approaches in software documentation: A systematic literature review

Information and Software Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and semantically) using a language model. The entire network is trained jointly on all these tasks using weight-sharing, an instance of multitask learning. All the tasks use labeled data except the language model which is learnt from unlabeled text and represents a novel form of semi-supervised learning for the shared tasks. We show how both multitask learning and semi-supervised learning improve the generalization of the shared tasks, resulting in state-of-the-art-performance.