Another look at automatic text-retrieval systems
Communications of the ACM
Word sense disambiguation and information retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Natural language processing for information retrieval
Communications of the ACM
User-specified domain knowledge for document retrieval
Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
Computer Evaluation of Indexing and Text Processing
Journal of the ACM (JACM)
Information Retrieval
Modern Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Documents similarity measurement using field association terms
Information Processing and Management: an International Journal
A blueprint for automatic indexing
ACM SIGIR Forum
Effect of Preprocessing on Extractive Summarization with Maximal Frequent Sequences
MICAI '08 Proceedings of the 7th Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence
Hi-index | 0.00 |
This paper analyzes the effect of two factors affecting retrieval performance of Malay textual documents: similarity measures and conflation of words. Three similarity measures namely inner product for unweighted query terms, inner product for weighted query terms and cosine of the angle between query and document vectors have been studied and tested on Malay test collection. This paper shows that cosine method outperforms other similarity measures significantly. To further enhance the performance, data has been conflated using Malay stemming algorithms. This conflated data together with cosine method as a basis for calculating similarity in vector space shows significant improvement in term of precision.