On retrieval performance of Malay textual documents

  • Authors:
  • Mohd Pouzi Hamzah;Tengku Mohd Tengku Sembok

  • Affiliations:
  • Department Of Computer Science, Kolej Universiti Sains & Teknologi Malaysia, Malaysia;Department Of Information Science, Universiti Kebangsaan Malaysia, Malaysia

  • Venue:
  • AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper analyzes the effect of two factors affecting retrieval performance of Malay textual documents: similarity measures and conflation of words. Three similarity measures namely inner product for unweighted query terms, inner product for weighted query terms and cosine of the angle between query and document vectors have been studied and tested on Malay test collection. This paper shows that cosine method outperforms other similarity measures significantly. To further enhance the performance, data has been conflated using Malay stemming algorithms. This conflated data together with cosine method as a basis for calculating similarity in vector space shows significant improvement in term of precision.