A practical solution to the problem of automatic part-of-speech induction from text

Authors:
Reinhard Rapp
Affiliations:
University of Mainz, Germersheim, Germany
Venue:
ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
Year:
2005

Citing 5
Cited 2

Class-based n-gram models of natural language

Computational Linguistics
Part-of-speech induction from scratch

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Combining distributional and morphological information for part of speech induction

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
A practical solution to the problem of automatic word sense induction

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Toward unsupervised whole-corpus tagging

COLING '04 Proceedings of the 20th international conference on Computational Linguistics

Deriving an ambiguous word's part-of-speech distribution from unannotated text

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Unsupervised Part-of-Speech Tagging in the Large

Research on Language and Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of part-of-speech induction from text involves two aspects: Firstly, a set of word classes is to be derived automatically. Secondly, each word of a vocabulary is to be assigned to one or several of these word classes. In this paper we present a method that solves both problems with good accuracy. Our approach adopts a mixture of statistical methods that have been successfully applied in word sense induction. Its main advantage over previous attempts is that it reduces the syntactic space to only the most important dimensions, thereby almost eliminating the otherwise omnipresent problem of data sparseness.