A malay stemmer for jawi characters

  • Authors:
  • Suliana Sulaiman;Khairuddin Omar;Nazlia Omar;Mohd Zamri Murah;Hamdan Abdul Rahman

  • Affiliations:
  • Fakulti Seni, Komputeran dan Industri Kreatif, Universiti Pendidikan Sultan Idris, Tanjong Malim, Malaysia;Fakulti Teknologi dan Sains Maklumat, Universiti Kebangsaan Malaysia, Bangi, Malaysia;Fakulti Teknologi dan Sains Maklumat, Universiti Kebangsaan Malaysia, Bangi, Malaysia;Fakulti Teknologi dan Sains Maklumat, Universiti Kebangsaan Malaysia, Bangi, Malaysia;Fakulti Teknologi dan Sains Maklumat, Universiti Kebangsaan Malaysia, Bangi, Malaysia

  • Venue:
  • AI'11 Proceedings of the 24th international conference on Advances in Artificial Intelligence
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Malay language may be written using either Roman or Jawi characters. Most Malay stemmers cover only Roman (Rumi ) affixes. This paper proposes a stemmer for Jawi characters using two sets of rules in Jawi: one set of rules is used to stem various forms of derived words, and another set is used to replace the use of a dictionary by producing the root word for each derivative. This stemmer has been tested using 1185 derived words consisting of prefix, circumfix, suffix, and infix. The results show that 84.89% of Jawi root words have been successfully stemmed.