Inferring decision trees using the minimum description length principle
Information and Computation
Redundancy reduction as a strategy for unsupervised learning
Neural Computation
An introduction to Kolmogorov complexity and its applications
An introduction to Kolmogorov complexity and its applications
Bayesian learning of probabilistic language models
Bayesian learning of probabilistic language models
Statistical methods for speech recognition
Statistical methods for speech recognition
Stochastic Complexity in Statistical Inquiry Theory
Stochastic Complexity in Statistical Inquiry Theory
Introduction to Information Theory and Data Compression
Introduction to Information Theory and Data Compression
The Unsupervised Acquisition of a Lexicon from Continuous Speech
The Unsupervised Acquisition of a Lexicon from Continuous Speech
A stochastic process for word frequency distributions
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
An Unsupervised Algorithm for Segmenting Categorical Timeseries into Episodes
Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery
Learning the lexicon from raw texts for open-vocabulary Korean word recognition
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
A statistical model for word discovery in transcribed speech
Computational Linguistics
Chinese text segmentation with MBDP-1: making the most of training corpora
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Unsupervised segmentation of words using prior distributions of morph length and frequency
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Minimally supervised morphological analysis by multimodal alignment
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Unsupervised discovery of morphemes
MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Learning case-based knowledge for disambiguating Chinese word segmentation: a preliminary study
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
An algorithm for the unsupervised learning of morphology
Natural Language Engineering
Unsupervised models for morpheme segmentation and morphology learning
ACM Transactions on Speech and Language Processing (TSLP)
Contextual dependencies in unsupervised word segmentation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A hybrid back-transliteration system for Japanese
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Morph-based speech recognition and modeling of out-of-vocabulary words across languages
ACM Transactions on Speech and Language Processing (TSLP)
Voting experts: An unsupervised algorithm for segmenting sequences
Intelligent Data Analysis
Applications of corpus-based semantic similarity and word segmentation to database schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Automatic discovery of topics and acoustic morphemes from speech
Computer Speech and Language
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Improving word segmentation by simultaneously learning phonotactics
CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Unsupervised discovery of Persian morphemes
EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
Greek word segmentation using minimal information
HLT-SRWS '04 Proceedings of the Student Research Workshop at HLT-NAACL 2004
Lexical and grammatical inference
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Induction of a simple morphology for highly-inflecting languages
SIGMorPhon '04 Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology
Segment predictability as a cue in word segmentation: application to modern Greek
SIGMorPhon '04 Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology
Unsupervised word segmentation for Sesotho using Adaptor Grammars
SigMorPhon '08 Proceedings of the Tenth Meeting of ACL Special Interest Group on Computational Morphology and Phonology
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Language independent word segmentation for statistical machine translation
Proceedings of the 3rd International Universal Communication Symposium
Representational bias in unsupervised learning of syllable structure
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Learning words and their meanings from unsegmented child-directed speech
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Unsupervised search for the optimal segmentation for statistical machine translation
ACLstudent '10 Proceedings of the ACL 2010 Student Research Workshop
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Recession segmentation: simpler online word segmentation using limited resources
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Unsupervised phonemic Chinese word segmentation using adaptor grammars
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Selected operations and applications of n-tape weighted finite-state machines
FSMNLP'09 Proceedings of the 8th international conference on Finite-state methods and natural language processing
Web scale NLP: a case study on url word breaking
Proceedings of the 20th international conference on World wide web
Fully unsupervised word segmentation with BVE and MDL
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Modeling infant word segmentation
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Word segmentation as general chunking
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models
The Journal of Machine Learning Research
CMCL '11 Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics
Online Learning Mechanisms for Bayesian Models of Word Segmentation
Research on Language and Computation
Unsupervised segmentation of chinese corpus using accessor variety
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Bootstrapping a unified model of lexical and phonetic acquisition
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Using rejuvenation to improve particle filtering for Bayesian word segmentation
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Hi-index | 0.00 |
This paper presents a model-based, unsupervised algorithmfor recovering word boundaries in a natural-language text from whichthey have been deleted. The algorithm is derived from a probabilitymodel of the source that generated the text. The fundamentalstructure of the model is specified abstractly so that the detailedcomponent models of phonology, word-order, and word frequency can bereplaced in a modular fashion. The model yields alanguage-independent, prior probability distribution on all possiblesequences of all possible words over a given alphabet, based on theassumption that the input was generated by concatenating words from afixed but unknown lexicon. The model is unusual in that it treatsthe generation of a complete corpus, regardless of length, as asingle event in the probability space. Accordingly, the algorithmdoes not estimate a probability distribution on words; instead, itattempts to calculate the prior probabilities of various wordsequences that could underlie the observed text. Experiments onphonemic transcripts of spontaneous speech by parents to youngchildren suggest that our algorithm is more effective than otherproposed algorithms, at least when utterance boundaries are given andthe text includes a substantial number of short utterances.