Sinhala grapheme-to-phoneme conversion and rules for schwa epenthesis
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
A rule based syllabification algorithm for sinhala
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Hi-index | 0.00 |
Lexicon is in important resource in any kind of language processing application. Corpus-based lexica have several advantages over other traditional approaches. The lexicon developed for Sinhala was based on the text obtained from a corpus of 10 million words drawn from diverse genres. The words extracted from the corpus have been labeled with parts of speech categories defined according to a novel classification proposed for Sinhala. The lexicon reports 80% coverage over unrestricted text obtained from online sources. The lexicon has been implemented in Lexical Mark up Framework.