On retrieval performance of Malay textual documents

Authors:
Mohd Pouzi Hamzah;Tengku Mohd Tengku Sembok
Affiliations:
Department Of Computer Science, Kolej Universiti Sains & Teknologi Malaysia, Malaysia;Department Of Information Science, Universiti Kebangsaan Malaysia, Malaysia
Venue:
AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
Year:
2006

Citing 10
Cited 1

Another look at automatic text-retrieval systems

Communications of the ACM
Word sense disambiguation and information retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Natural language processing for information retrieval

Communications of the ACM
User-specified domain knowledge for document retrieval

Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
Computer Evaluation of Indexing and Text Processing

Journal of the ACM (JACM)
Information Retrieval

Information Retrieval
Modern Information Retrieval

Modern Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Documents similarity measurement using field association terms

Information Processing and Management: an International Journal
A blueprint for automatic indexing

ACM SIGIR Forum

Effect of Preprocessing on Extractive Summarization with Maximal Frequent Sequences

MICAI '08 Proceedings of the 7th Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper analyzes the effect of two factors affecting retrieval performance of Malay textual documents: similarity measures and conflation of words. Three similarity measures namely inner product for unweighted query terms, inner product for weighted query terms and cosine of the angle between query and document vectors have been studied and tested on Malay test collection. This paper shows that cosine method outperforms other similarity measures significantly. To further enhance the performance, data has been conflated using Malay stemming algorithms. This conflated data together with cosine method as a basis for calculating similarity in vector space shows significant improvement in term of precision.