Multivariate data analysis (4th ed.): with readings
Multivariate data analysis (4th ed.): with readings
Foundations of statistical natural language processing
Foundations of statistical natural language processing
ACM Computing Surveys (CSUR)
Principles of data mining
Cluster Analysis
Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
Sura Length and Lexical Probability Estimation in Cluster Analysis of the Qur’an
ACM Transactions on Asian Language Information Processing (TALIP)
Hi-index | 0.00 |
In this paper, we develop a methodology for discovering the thematic structure of the Qur'an based on a fundamental idea in data mining and related disciplines: that, with respect to some collection of texts, the lexical frequency profiles of the individual texts are a good indicator of their conceptual content, and thus provide a reliable criterion for their classification relative to one another. This idea is applied to the discovery of thematic interrelationships among the suras (chapters) of the Qur'an by abstracting lexical frequency data from them and then applying hierarchical cluster analysis to that data. The results reported here indicate that the proposed methodology yields usable results in understanding the Qur'an on the basis of its lexical semantics.