Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Self-Organizing Maps
Guest Editors' Introduction to the Special Issue on Automated Text Categorization
Journal of Intelligent Information Systems
Kernel partial least squares regression in reproducing kernel hilbert space
The Journal of Machine Learning Research
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Learning Spectral Clustering, With Application To Speech Separation
The Journal of Machine Learning Research
Hi-index | 0.00 |
This paper introduces the use of 15 different readability indices as a fingerprint that enables the classification of documents into different categories. While a classification based on such fingerprints alone is not necessarily superior to document categorization based on dedicated dictionaries per se, the document fingerprints can enhance the overall classification rate by applying proper data fusion techniques. For other applications text mining related applications such as language classification, the detection of plagiarism, or author identification, the accuracy of text categorization methods based on readability fingerprints can even exceed a dictionary-based approach. A novel addition to the readability indices is the addition of histograms based on the word length of all the dictionary words used in the text and a dictionary of the most common easy words in the English language.