Part-of-speech induction from scratch

Authors:
Hinrich Schütze
Affiliations:
Center for the Study of Language and Information, Stanford, CA
Venue:
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Year:
1993

Citing 5
Cited 24

Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
An efficient gradient-based algorithm for on-line training of recurrent network trajectories

Neural Computation
Distributed Representations, Simple Recurrent Networks, And Grammatical Structure

Machine Learning - Connectionist approaches to language learning
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Class-based n-gram models of natural language

Computational Linguistics

Fast and quasi-natural language search for gigabytes of Chinese texts

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Improving statistical language model performance with automatically generated word hierarchies

Computational Linguistics
A Review of Statistical Language Processing Techniques

Artificial Intelligence Review
Distributional part-of-speech tagging

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Grouping words using statistical context

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Combining a Chinese thesaurus with a Chinese dictionary

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Word class discovery for postprocessing Chinese handwriting recognition

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Using a hybrid system of corpus and knowledge-based techniques to automate the induction of a lexical sublanguage grammar

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Automatic identification of word translations from unrelated English and German corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Language independent, minimally supervised induction of lexical probabilities

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Inducing syntactic categories by context distribution clustering

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
A practical solution to the problem of automatic part-of-speech induction from text

ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
Automatic extraction of the multiple semantic and syntactic categories of words

AIAP'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications
Deriving an ambiguous word's part-of-speech distribution from unannotated text

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
An incremental bayesian model for learning syntactic categories

CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Towards full automation of lexicon construction

CLS '04 Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics
Unsupervised methods for head assignments

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Using DEDICOM for completely unsupervised part-of-speech tagging

UMSLLS '09 Proceedings of the Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics
A simple unsupervised learner for POS disambiguation rules given only a minimal lexicon

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Unsupervised Part-of-Speech Tagging in the Large

Research on Language and Computation
Efficient, correct, unsupervised learning of context-sensitive languages

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Crouching Dirichlet, hidden Markov model: unsupervised POS tagging with context local tag generation

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Editorial: Network based models of cognitive and social dynamics of human languages

Computer Speech and Language
Investigating the Relationship Between Linguistic Representation and Computation through an Unsupervised Model of Human Morphology Learning

Research on Language and Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a method for inducing the parts of speech of a language and part-of-speech labels for individual words from a large text corpus. Vector representations for the part-of-speech of a word are formed from entries of its near lexical neighbors. A dimensionality reduction creates a space representing the syntactic categories of unambiguous words. A neural net trained on these spatial representations classifies individual contexts of occurrence of ambiguous words. The method classifies both ambiguous and unambiguous words correctly with high accuracy.