Memory-Based Lexical Acquisition and Processing
Proceedings of the Third International EAMT Workshop on Machine Translation and the Lexicon
Unsupervised learning of the morphology of a natural language
Computational Linguistics
Acquiring receptive morphology: a connectionist model
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
On the Statistical Properties of the F-measure
QSIC '04 Proceedings of the Quality Software, Fourth International Conference
Unsupervised segmentation of words using prior distributions of morph length and frequency
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Unsupervised learning of morphology for building lexicon for a highly inflectional language
MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Unsupervised learning of morphology using a novel directed search algorithm: taking the first step
MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Induction of a simple morphology for highly-inflecting languages
SIGMorPhon '04 Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology
Part of speech tagger for Assamese text
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Analysis and evaluation of stemming algorithms: a case study with Assamese
Proceedings of the International Conference on Advances in Computing, Communications and Informatics
An improved stemming approach using HMM for a highly inflectional language
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Hi-index | 0.00 |
This article describes an approach to unsupervised learning ofmorphology from an unannotated corpus for a highly inflectionalIndo-European language called Assamese spoken by about 30 millionpeople. Although Assamese is one of Indias national languages, itutterly lacks computational linguistic resources. There exists noprior computational work on this language spoken widely innortheast India. The work presented is pioneering in this respect.In this article, we discuss salient issues in Assamese morphologywhere the presence of a large number of suffixal determiners,sandhi, samas, and the propensity to use suffix sequences makeapproximately 50% of the words used in written and spoken textinflected. We implement methods proposed by Gaussier and Goldsmithon acquisition of morphological knowledge, and obtain F-measureperformance below 60%. This motivates us to present a method moresuitable for handling suffix sequences, enabling us to increase theF-measure performance of morphology acquisition to almost 70%. Wedescribe how we build a morphological dictionary for Assamese fromthe text corpus. Using the morphological knowledge acquired and themorphological dictionary, we are able to process small chunks ofdata at a time as well as a large corpus. We achieve approximately85% precision and recall during the analysis of small chunks ofcoherent text.