Tagging English text with a probabilistic model

Authors:
Bernard Merialdo
Affiliations:
Institut EURECOM
Venue:
Computational Linguistics
Year:
1994

Citing 13
Cited 137

Choice of grammatical word-class without global syntactic analysis: tagging words in the LOB Corpus.

Computers and the Humanities
Natural Language Modeling for Phoneme-to-Text Transcription

IEEE Transactions on Pattern Analysis and Machine Intelligence
Grammatical category disambiguation by statistical optimization

Computational Linguistics
Deducing linguistic structure from the statistics of large corpora

HLT '90 Proceedings of the workshop on Speech and Natural Language
A Computational Approach to Grammatical Coding of English Words

Journal of the ACM (JACM)
Stochastic approach to the grammatical coding of english

Communications of the ACM
Dynamic Programming

Dynamic Programming
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
DILEMMA-2: a lemmatizer-tagger for medical abstracts

ANLC '92 Proceedings of the third conference on Applied natural language processing
A probabilistic approach to grammatical analysis of written English by computer

EACL '85 Proceedings of the second conference on European chapter of the Association for Computational Linguistics
A probabilistic parser

EACL '85 Proceedings of the second conference on European chapter of the Association for Computational Linguistics
Parsing the LOB corpus

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Lexicon and grammar in probabilistic tagging of written English

ACL '88 Proceedings of the 26th annual meeting on Association for Computational Linguistics

Automatic stochastic tagging of natural language texts

Computational Linguistics
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
A Machine Learning Approach to POS Tagging

Machine Learning
Automating the Construction of Internet Portals with Machine Learning

Information Retrieval
Syllable-pattern-based unknown-morpheme segmentation and estimation for hybrid part-of-speech tagging of Korean

Computational Linguistics
On-Line Error Detection of Annotated Corpus Using Modular Neural Networks

ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
An Integrated Statistical Model for Tagging and Chunking Unrestricted Text

TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
Using Multiattribute Prediction Suffix Graphs for Spanish Part-of-Speech Tagging

IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Tagging with Small Training Corpora

IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
ITC-irst at CLEF 2000: Italian Monolingual Track

CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Shallow parsing using specialized hmms

The Journal of Machine Learning Research
Exploitation of Unlabeled Sequences in Hidden Markov Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Introduction to the special issue on word sense disambiguation: the state of the art

Computational Linguistics - Special issue on word sense disambiguation
An automatic reviser: the TransCheck system

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Morphological tagging: data vs. dictionaries

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Does Baum-Welch re-estimation help taggers?

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Probabilistic and rule-based tagger of an inflective language: a comparison

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Fuzzy network model for part-of-speech tagging under small training data

Natural Language Engineering
Review of "Statistical language learning" by Eugene Charniak. The MIT Press 1993.

Computational Linguistics
A syntax-based part-of-speech analyser

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
A freely available morphological analyzer, disambiguator and context sensitive lemmatizer for German

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
A multi-neuro tagger using variable lengths of contexts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Tagging inflective languages: prediction of morphological categories for a rich, structured tagset

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Morphological cues for lexical semantics

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Minimizing manual annotation cost in supervised training from corpora

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
An English to Korean transliteration model of extended Markov window

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Hybrid neuro and rule-based part of speech taggers

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
A stochastic parser based on a structural word prediction model

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Improving part-of-speech tagging using lexicalized HMMs

Natural Language Engineering
Unsupervised learning of a rule-based Spanish Part of Speech tagger

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Beyond skeleton parsing: producing a comprehensive large-scale general-English treebank with full grammatical analysis

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Markov random field based English part-of-speech tagging system

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Tagging and chunking with bigrams

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Tagging spoken language using written language statistics

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Probabilistic models of verb-argument structure

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Scaling to very very large corpora for natural language disambiguation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Unsupervised learning of name structure from coreference data

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Applying co-training methods to statistical parsing

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Loosely tree-based alignment for machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
An interactive spreadsheet for teaching the forward-backward algorithm

ETMTNLP '02 Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics - Volume 1
Automatic word spacing using hidden Markov model for refining Korean text corpora

COLING '02 Proceedings of the 3rd workshop on Asian language resources and international standardization - Volume 12
Bootstrapping a multilingual part-of-speech tagger in one person-day

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Bootstrapping POS taggers using unlabelled data

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Updating an NLP system to fit new domains: an empirical study on the sentence segmentation problem

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Class Based Sense Definition Model for word sense tagging and disambiguation

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Combining optimal clustering and Hidden Markov models for extractive summarization

MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12
Annealing techniques for unsupervised statistical language learning

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A high-performance semi-supervised learning method for text chunking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Contrastive estimation: training log-linear models on unlabeled data

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
An unsupervised morpheme-based HMM for hebrew morphological disambiguation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Discriminative hidden Markov modeling with long state dependence using a kNN ensemble

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Toward unsupervised whole-corpus tagging

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Using the revised EM algorithm to remove noisy data for improving the one-against-the-rest method in binary text classification

Information Processing and Management: an International Journal
Using Multiattribute Prediction Suffix Graphs to Predict and Generate Music

Computer Music Journal
Automatic Word Spacing Using Probabilistic Models Based on Character n-grams

IEEE Intelligent Systems
Using lexical dependency and ontological knowledge to improve a detailed syntactic and semantic tagger of English

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Simple, robust, scalable semi-supervised learning via expectation regularization

Proceedings of the 24th international conference on Machine learning
Exploiting spatial context constraints for automatic image region annotation

Proceedings of the 15th international conference on Multimedia
Part-of-speech tagging of modern hebrew text

Natural Language Engineering
Statistical machine translation

ACM Computing Surveys (CSUR)
Combining automatic acquisition of knowledge with machine learning approaches for multilingual temporal recognition and normalization

Information Sciences: an International Journal
Part-of-Speech Tagging Based on Machine Translation Techniques

IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I
A Korean Part-of-Speech Tagging System Using Resolution Rules for Individual Ambiguous Word

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part II
Specific features of the Galician language and implications for speech technology development

Speech Communication
Using target-language information to train part-of-speech taggers for machine translation

Machine Translation
Innovations in Natural Language Document Processing for Requirements Engineering

Innovations for Requirement Analysis. From Stakeholders' Needs to Formal Designs
Probabilistic Methods for a Japanese Syllable Cipher

ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
A Broker for Universal Access to Web Services

CNSR '09 Proceedings of the 2009 Seventh Annual Communication Networks and Services Research Conference
Towards full automation of lexicon construction

CLS '04 Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics
Homotopy-based semi-supervised Hidden Markov Models for sequence labeling

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Evaluating unsupervised part-of-speech tagging for grammar induction

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Tagging Portuguese with a Spanish tagger using cognates

CrossLangInduction '06 Proceedings of the International Workshop on Cross-Language Knowledge Induction
Weakly supervised part-of-speech tagging for morphologically-rich, resource-scarce languages

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Unsupervised multilingual learning for POS tagging

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Semi-supervised training of a statistical parser from unlabeled partially-bracketed data

IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Etiquetage grammatical de l'arabe voyellé ou non

Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages
A novel word segmentation approach for written languages with word boundary markers

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Minimized models for unsupervised part-of-speech tagging

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Dependency constraints for lexical disambiguation

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Representational bias in unsupervised learning of syllable structure

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Multilingual part-of-speech tagging: two unsupervised approaches

Journal of Artificial Intelligence Research
Direct modelling of output context dependence in discriminative hidden Markov model

Pattern Recognition Letters
Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data

The Journal of Machine Learning Research
Studying the advantages of a messy evolutionary algorithm for natural language tagging

GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII
Morpho-syntactic post-processing of N-best lists for improved French automatic speech recognition

Computer Speech and Language
Wordica: Emergence of linguistic representations for words by independent component analysis

Natural Language Engineering
Bayesian inference for finite-state transducers

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Painless unsupervised learning with features

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
From baby steps to Leapfrog: how "Less is More" in unsupervised dependency parsing

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Minimized models and grammar-informed initialization for supertagging with highly ambiguous lexicons

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Improved unsupervised POS induction through prototype discovery

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Simple semi-supervised training of part-of-speech taggers

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Efficient optimization of an MDL-inspired objective function for unsupervised part-of-speech tagging

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Viterbi training improves unsupervised dependency parsing

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Improved unsupervised POS induction using intrinsic clustering quality and a Zipfian constraint

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Distributed asynchronous online learning for natural language processing

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Two decades of unsupervised POS induction: how far have we come?

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Unsupervised parse selection for HPSG

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Simple type-level unsupervised POS tagging

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Fast, greedy model minimization for unsupervised tagging

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Adding morphological information to a connectionist part-of-speech tagger

CAEPIA'09 Proceedings of the Current topics in artificial intelligence, and 13th conference on Spanish association for artificial intelligence
A comparison of unsupervised methods for part-of-speech tagging in Chinese

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Unsupervised part of speech tagging using unambiguous substitutes from a statistical language model

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Inducing Tree-Substitution Grammars

The Journal of Machine Learning Research
Towards heuristic algorithmic memory

AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
Unsupervised Russian POS tagging with appropriate context

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Quantifying the impact of concept recognition on biomedical information retrieval

Information Processing and Management: an International Journal
Controlling complexity in part-of-speech induction

Journal of Artificial Intelligence Research
Unsupervised multilingual learning

Unsupervised multilingual learning
The brain as a statistical inference engine-and you can too*

Computational Linguistics
A low-budget tagger for Old Czech

LaTeCH '11 Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Text mining for medical documents using a hidden markov model

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Investigating the best configuration of HMM spanish pos tagger when minimum amount of training data is available

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Mutual information independence model using kernel density estimation for segmenting and labeling sequential data

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Unsupervised structure prediction with non-parallel multilingual guidance

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Lateen EM: unsupervised training with multiple objectives, applied to dependency grammar induction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Unsupervised dependency parsing without gold part-of-speech tags

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
miraQA: experiments with learning answer context patterns from the web

CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
Data driven approaches to speech and language processing

Nonlinear Speech Modeling and Applications
Learning structural dependencies of words in the Zipfian tail

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
The latent words language model

Computer Speech and Language
Adaptive Bayesian HMM for Fully Unsupervised Chinese Part-of-Speech Induction

ACM Transactions on Asian Language Information Processing (TALIP)
Arabic morphological analysis and disambiguation using a possibilistic classifier

ICIC'12 Proceedings of the 8th international conference on Intelligent Computing Theories and Applications
Unsupervised learning on an approximate corpus

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Unified expectation maximization

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Clustered word classes for preordering in statistical machine translation

ROBUS-UNSUP '12 Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP
Unsupervised part-of-speech tagging in noisy and esoteric domains with a syntactic-semantic Bayesian HMM

Proceedings of the Workshop on Semantic Analysis in Social Media
Exploiting partial annotations with EM training

WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Spectral dependency parsing with latent variables

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Type-supervised hidden Markov models for part-of-speech tagging with incomplete tag dictionaries

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Learning syntactic categories using paradigmatic representations of word context

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Wiki-ly supervised part-of-speech tagging

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Coupling an annotated corpus and a lexicon for state-of-the-art POS tagging

Language Resources and Evaluation
Ontology-based information extraction of regulatory networks from scientific articles with case studies for Escherichia coli

Expert Systems with Applications: An International Journal
A ruled-based part of speech (RPOS) tagger for malay text articles

ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part II
Full Length Article: Simulated annealing based classifier ensemble techniques: Application to part of speech tagging

Information Fusion
Joint semi-supervised learning of Hidden Conditional Random Fields and Hidden Markov Models

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present some experiments on the use of a probabilistic model to tag English text, i.e. to assign to each word the correct tag (part of speech) in the context of the sentence. The main novelty of these experiments is the use of untagged text in the training of the model. We have used a simple triclass Markov model and are looking for the best way to estimate the parameters of this model, depending on the kind and amount of training data provided. Two approaches in particular are compared and combined:using text that has been tagged by hand and computing relative frequency counts,using text without tags and training the model as a hidden Markov process, according to a Maximum Likelihood principle.Experminents show that the best training is obtained by using as much tagged text as possible. They also show that Maximum Likelihood training, the procedure that is routinely used to estimate hidden Markov models parameters from training data, will not necessarily improve the tagging accuracy. In fact, it will generally degrade this accuracy, except when only a limited amount of hand-tagged text is available.