Corpus-based Sinhala lexicon

Authors:
Ruvan Weerasinghe;Dulip Herath;Viraj Welgama
Affiliations:
University of Colombo School of Computing, Colombo, Sri Lanka;University of Colombo School of Computing, Colombo, Sri Lanka;University of Colombo School of Computing, Colombo, Sri Lanka
Venue:
ALR7 Proceedings of the 7th Workshop on Asian Language Resources
Year:
2009

Citing 2
Cited 0

Sinhala grapheme-to-phoneme conversion and rules for schwa epenthesis

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
A rule based syllabification algorithm for sinhala

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Lexicon is in important resource in any kind of language processing application. Corpus-based lexica have several advantages over other traditional approaches. The lexicon developed for Sinhala was based on the text obtained from a corpus of 10 million words drawn from diverse genres. The words extracted from the corpus have been labeled with parts of speech categories defined according to a novel classification proposed for Sinhala. The lexicon reports 80% coverage over unrestricted text obtained from online sources. The lexicon has been implemented in Lexical Mark up Framework.