Semantic similarity measures for Malay sentences

  • Authors:
  • Shahrul Azman Noah;Amru Yusrin Amruddin;Nazlia Omar

  • Affiliations:
  • Faculty of Information Science & Technology, Universiti Kebangaan Malaysia, Bangi, Selangor;Faculty of Information Science & Technology, Universiti Kebangaan Malaysia, Bangi, Selangor;Faculty of Information Science & Technology, Universiti Kebangaan Malaysia, Bangi, Selangor

  • Venue:
  • ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The concept of semantic similarity is an important element in many applications such as information extraction, information retrieval, document clustering and ontology learning. Most of the previous works regarding semantic similarity measures have been traditionally defined between words or concepts (i.e. word-to-word similarity), thus ignoring the text or sentence that the concepts participate. Semantic text similarity was made possible with the availability of resources in the form of semantic lexicon such as the WordNet for English and GermaNet for German. However, for languages such as Malay, text similarity proved to be difficult due to the unavailability of similar resources. This paper, however, describe our approach for text similarity in Malay language. We used a preprocessed Malay dictionary and the overlap edge counting based method to first calculate the word-to-word semantic similarity. The word-to-word semantic similarity measure is then used to identify the semantic sentence similarity using a modified approach for English language. Results of the experiments are very encouraging, and indicate the potential of semantic similarity measure for Malay sentences.