Language independent, minimally supervised induction of lexical probabilities

Authors:
Silviu Cucerzan;David Yarowsky
Affiliations:
Johns Hopkins University, Baltimore, MD;Johns Hopkins University, Baltimore, MD
Venue:
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Year:
2000

Citing 11
Cited 11

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Coping with ambiguity and unknown words through probabilistic models

Computational Linguistics - Special issue on using large corpora: II
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Morphological tagging: data vs. dictionaries

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Predicting part-of-speech information about unknown words using statistical methods

ACL '98 Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2
Classifier combination for improved lexical disambiguation

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Tagging inflective languages: prediction of morphological categories for a rich, structured tagset

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Part-of-speech induction from scratch

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Parsing the LOB corpus

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics

Combining Classifiers for word sense disambiguation

Natural Language Engineering
Inducing information extraction systems for new languages via cross-language projection

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Minimally supervised morphological analysis by multimodal alignment

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Modeling consensus: classifier combination for word sense disambiguation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Augmented mixture models for lexical disambiguation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Bootstrapping a multilingual part-of-speech tagger in one person-day

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Robust ending guessing rules with application to Slavonic languages

ROMAND '04 Proceedings of the 3rd Workshop on RObust Methods in Analysis of Natural Language Data
A global model for joint lemmatization and part-of-speech prediction

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
A low-budget tagger for Old Czech

LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
The Johns Hopkins SENSEVAL2 system descriptions

SENSEVAL '01 The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A central problem in part-of-speech tagging, especially for new languages for which limited annotated resources are available, is estimating the distribution of lexical probabilities for unknown words. This paper introduces a new paradigmatic similarity measure and presents a minimally supervised learning approach combining effective selection and weighting methods based on paradigmatic and contextual similarity measures populated from large quantities of inexpensive raw text data. This approach is highly language independent and requires no modification to the algorithm or implementation to shift between languages such as French and English.