Redundancy reduction as a strategy for unsupervised learning
Neural Computation
Inference of variable-length linguistic and acoustic units by multigrams
Speech Communication
An Efficient, Probabilistically Sound Algorithm for Segmentation andWord Discovery
Machine Learning - Special issue on natural language learning
Stochastic Complexity in Statistical Inquiry Theory
Stochastic Complexity in Statistical Inquiry Theory
The Unsupervised Acquisition of a Lexicon from Continuous Speech
The Unsupervised Acquisition of a Lexicon from Continuous Speech
Unsupervised learning of the morphology of a natural language
Computational Linguistics
Linguistic structure as composition and perturbation
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Knowledge-free induction of morphology using latent semantic analysis
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Morphemes as necessary concept for structures discovery from untagged corpora
NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning
Unsupervised segmentation of words using prior distributions of morph length and frequency
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Design of Chinese morphological analyzer
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
Unsupervised models for morpheme segmentation and morphology learning
ACM Transactions on Speech and Language Processing (TSLP)
Efficient unsupervised recursive word segmentation using minimum description length
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Unlimited vocabulary speech recognition for agglutinative languages
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
ACM Transactions on Asian Language Information Processing (TALIP)
Morph-based speech recognition and modeling of out-of-vocabulary words across languages
ACM Transactions on Speech and Language Processing (TSLP)
Don't have a stemmer?: be un+concern+ed
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Addressing morphological variation in alphabetic languages
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Unsupervised discovery of Persian morphemes
EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
NAACL-Demonstrations '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Demonstration Session
Priors in Bayesian learning of phonological rules
SIGMorPhon '04 Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology
Induction of a simple morphology for highly-inflecting languages
SIGMorPhon '04 Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology
Morphology induction from limited noisy data using approximate string matching
SIGPHON '06 Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology and Morphology
A naive theory of affixation and an algorithm for extraction
SIGPHON '06 Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology and Morphology
TextGraphs-3 Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing
Minimized models for unsupervised part-of-speech tagging
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Unsupervised morphological segmentation and clustering with document boundaries
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Morphology induction from term clusters
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Inducing Morphemes Using Light Knowledge
ACM Transactions on Asian Language Information Processing (TALIP)
Overview of Morpho challenge 2008
CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Morpho challenge evaluation by information retrieval experiments
CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Minimized models and grammar-informed initialization for supertagging with highly ambiguous lexicons
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Applying morphological decomposition to statistical machine translation
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Semi-supervised learning of concatenative morphology
SIGMORPHON '10 Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology
Morpho Challenge competition 2005--2010: evaluations and results
SIGMORPHON '10 Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology
Weakly supervised morphology learning for agglutinating languages using small training sets
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Sparse imputation for large vocabulary noise robust ASR
Computer Speech and Language
Language detection and tracking in multilingual documents using weak estimators
SSPR&SPR'10 Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition
Overview and results of Morpho challenge 2009
CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
MorphoNet: exploring the use of community structure for unsupervised morpheme analysis
CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Clustering morphological paradigms using syntactic categories
CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Speech transcription and spoken document retrieval in finnish
MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
Entropy as an indicator of context boundaries: an experiment using a web search engine
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Unsupervised word categorization using self-organizing maps and automatically extracted morphs
IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Probabilistic hierarchical clustering of morphological paradigms
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Evaluating a morphological analyser of Inuktitut
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Computational Intelligence - Volume Part II
Hi-index | 0.00 |
We present two methods for unsupervised segmentation of words into morpheme-like units. The model utilized is especially suited for languages with a rich morphology, such as Finnish. The first method is based on the Minimum Description Length (MDL) principle and works online. In the second method, Maximum Likelihood (ML) optimization is used. The quality of the segmentations is measured using an evaluation method that compares the segmentations produced to an existing morphological analysis. Experiments on both Finnish and English corpora show that the presented methods perform well compared to a current state-of-the-art system.