Semantic similarity measures for Malay sentences

Authors:
Shahrul Azman Noah;Amru Yusrin Amruddin;Nazlia Omar
Affiliations:
Faculty of Information Science & Technology, Universiti Kebangaan Malaysia, Bangi, Selangor;Faculty of Information Science & Technology, Universiti Kebangaan Malaysia, Bangi, Selangor;Faculty of Information Science & Technology, Universiti Kebangaan Malaysia, Bangi, Selangor
Venue:
ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Year:
2007

Citing 13
Cited 0

WordNet: a lexical database for English

Communications of the ACM
Experiments with a stemming algorithm for Malay words

Journal of the American Society for Information Science
Corpus-based stemming using cooccurrence of word variants

ACM Transactions on Information Systems (TOIS)
Similarity-based word sense disambiguation

Computational Linguistics - Special issue on word sense disambiguation
Learning to cluster web search results

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Mining knowledge from text using information extraction

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Similarity measures for tracking information flow

Proceedings of the 14th ACM international conference on Information and knowledge management
Find-similar: similarity browsing as a search tool

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Sentence Similarity Based on Semantic Nets and Corpus Statistics

IEEE Transactions on Knowledge and Data Engineering
A semantic approach to recognizing textual entailment

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Corpus-based and knowledge-based measures of text semantic similarity

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Similarity of objects and the meaning of words

TAMC'06 Proceedings of the Third international conference on Theory and Applications of Models of Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The concept of semantic similarity is an important element in many applications such as information extraction, information retrieval, document clustering and ontology learning. Most of the previous works regarding semantic similarity measures have been traditionally defined between words or concepts (i.e. word-to-word similarity), thus ignoring the text or sentence that the concepts participate. Semantic text similarity was made possible with the availability of resources in the form of semantic lexicon such as the WordNet for English and GermaNet for German. However, for languages such as Malay, text similarity proved to be difficult due to the unavailability of similar resources. This paper, however, describe our approach for text similarity in Malay language. We used a preprocessed Malay dictionary and the overlap edge counting based method to first calculate the word-to-word semantic similarity. The word-to-word semantic similarity measure is then used to identify the semantic sentence similarity using a modified approach for English language. Results of the experiments are very encouraging, and indicate the potential of semantic similarity measure for Malay sentences.