Empirical estimates of adaptation: the chance of two noriegas is closer to p/2 than p2

Authors:
Kenneth W. Church
Affiliations:
AT&T Labs-Research, Florham Park, NJ.
Venue:
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Year:
2000

Citing 3
Cited 27

Context and structure in automated full-text information access

Context and structure in automated full-text information access
Statistical methods for speech recognition

Statistical methods for speech recognition
Dynamic nonlocal language modeling via hierarchical topic-based adaptation

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

PERSIVAL, a system for personalized search and summarization over multimedia healthcare information

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Unknown word extraction for Chinese documents

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Parametric models of linguistic count data

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Extracting significant words from corpora for ontology extraction

Proceedings of the 3rd international conference on Knowledge capture
Empirical term weighting and expansion frequency

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
A bottom-up merging algorithm for Chinese unknown word extraction

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Integrating syntactic priming into an incremental probabilistic parser, with an application to psycholinguistic modeling

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Parallelism in coordination as an instance of syntactic priming: evidence from corpus-based modeling

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Computational model of role reversal imitation through continuous human-robot interaction

Proceedings of the 2007 workshop on Multimodal interfaces in semantic interaction
Substring Statistics

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
An efficient algorithm for building a distributional thesaurus (and other Sketch Engine developments)

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Relevance feedback models for recommendation

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Relative rank statistics for dialog analysis

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Computational linkuistics: word triggers across hyperlinks

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Language modeling for determiner selection

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Exploiting long distance collocational relations in predictive typing

TextEntry '03 Proceedings of the 2003 EACL Workshop on Language Modeling for Text Entry Methods
A Bayesian mixture model for term re-occurrence and burstiness

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
The same-head heuristic for coreference

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Utilizing extra-sentential context for parsing

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Retrieval constraints and word frequency distributions a log-logistic model for IR

Information Retrieval
Assessing lexical alignment in spontaneous direction dialogue data by means of a lexicon network model

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Chameleons in imagined conversations: a new approach to understanding coordination of linguistic style in dialogs

CMCL '11 Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics
The ngram statistics package (Text::NSP): a flexible tool for identifying ngrams, collocations, and word associations

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Identifying collocations to measure compositionality: shared task system description

DiSCo '11 Proceedings of the Workshop on Distributional Semantics and Compositionality
2012 Special Issue: Assessing cognitive alignment in different types of dialog by means of a network model

Neural Networks
Space efficiencies in discourse modeling via conditional random sampling

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Modeling lexical cohesion for document-level machine translation

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Repetition is very common. Adaptive language models, which allow probabilities to change or adapt after seeing just a few words of a text, were introduced in speech recognition to account for text cohesion. Suppose a document mentions Noriega once. What is the chance that he will be mentioned again? If the first instance has probability p, then under standard (bag-of-words) independence assumptions, two instances ought to have probability p2, but we find the probability is actually closer to p/2. The first mention of a word obviously depends on frequency, but surprisingly, the second does not. Adaptation depends more on lexical content than frequency; there is more adaptation for content words (proper nouns, technical terminology and good keywords for information retrieval), and less adaptation for function words, cliches and ordinary first names.