Learning condensed feature representations from large unsupervised data sets for supervised learning

Authors:
Jun Suzuki;Hideki Isozaki;Masaaki Nagata
Affiliations:
NTT Communication Science Laboratories, NTT Corp., Hikaridai, Seika-cho, Soraku-gun, Kyoto, Japan;NTT Communication Science Laboratories, NTT Corp., Hikaridai, Seika-cho, Soraku-gun, Kyoto, Japan;NTT Communication Science Laboratories, NTT Corp., Hikaridai, Seika-cho, Soraku-gun, Kyoto, Japan
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Year:
2011

Citing 17
Cited 0

On the limited memory BFGS method for large scale optimization

Mathematical Programming: Series A and B
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
A high-performance semi-supervised learning method for text chunking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
Scalable training of L1-regularized log-linear models

Proceedings of the 24th international conference on Machine learning
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Design challenges and misconceptions in named entity recognition

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Phrase clustering for discriminative learning

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
An empirical study of semi-supervised structured conditional models for dependency parsing

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Improving dependency parsing with subtrees from auto-parsed data

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Efficient Online and Batch Learning Using Forward Backward Splitting

The Journal of Machine Learning Research
Efficient third-order dependency parsers

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Word representations: a simple and general method for semi-supervised learning

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Turbo parsers: dependency parsing by approximate variational inference

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a novel approach for effectively utilizing unsupervised data in addition to supervised data for supervised learning. We use unsupervised data to generate informative 'condensed feature representations' from the original feature set used in supervised NLP systems. The main contribution of our method is that it can offer dense and low-dimensional feature spaces for NLP tasks while maintaining the state-of-the-art performance provided by the recently developed high-performance semi-supervised learning technique. Our method matches the results of current state-of-the-art systems with very few features, i.e., F-score 90.72 with 344 features for CoNLL-2003 NER data, and UAS 93.55 with 12.5K features for dependency parsing data derived from PTB-III.