Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Information Processing and Management: an International Journal - Special issue: history of information science
Introduction to Information Retrieval
Introduction to Information Retrieval
Cluster Analysis
Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
Understanding the thematic structure of the Qur'an: an exploratory multivariate approach
ACLstudent '05 Proceedings of the ACL Student Research Workshop
Hi-index | 0.00 |
Thabet [2005] applied cluster analysis to the Qur’an in the hope of generating a classification of the (suras) that is useful for understanding of its thematic structure. The result was positive, but variation in (sura) length was a problem because clustering of the shorter was found to be unreliable. The present discussion addresses this problem in four parts. The first part summarizes Thabet’s work. The second part argues that unreliable clustering of the shorter is a consequence of poor estimation of lexical population probabilities in those. The third part proposes a solution to the problem based on calculation of a minimum length threshold using concepts from statistical sampling theory followed by selection of and lexical variables based on that threshold. The fourth part applies the proposed solution to a reanalysis of the Qur’an.