Sura Length and Lexical Probability Estimation in Cluster Analysis of the Qur’an

Authors:
Hermann Moisl
Affiliations:
University of Newcastle, UK
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2009

Citing 6
Cited 0

Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Document length normalization

Information Processing and Management: an International Journal - Special issue: history of information science
Introduction to Information Retrieval

Introduction to Information Retrieval
Cluster Analysis

Cluster Analysis
Stemming the Qur'an

Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
Understanding the thematic structure of the Qur'an: an exploratory multivariate approach

ACLstudent '05 Proceedings of the ACL Student Research Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Thabet [2005] applied cluster analysis to the Qur’an in the hope of generating a classification of the (suras) that is useful for understanding of its thematic structure. The result was positive, but variation in (sura) length was a problem because clustering of the shorter was found to be unreliable. The present discussion addresses this problem in four parts. The first part summarizes Thabet’s work. The second part argues that unreliable clustering of the shorter is a consequence of poor estimation of lexical population probabilities in those. The third part proposes a solution to the problem based on calculation of a minimum length threshold using concepts from statistical sampling theory followed by selection of and lexical variables based on that threshold. The fourth part applies the proposed solution to a reanalysis of the Qur’an.