Information retrieval using a singular value decomposition model of latent semantic structure
SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A class-feature-centroid classifier for text categorization
Proceedings of the 18th international conference on World wide web
Eigenvector-based clustering using aggregated similarity matrices
Proceedings of the 2010 ACM Symposium on Applied Computing
Nearest neighbor pattern classification
IEEE Transactions on Information Theory
Hi-index | 0.00 |
Text classification is a process where documents are categorized usually by topic, place, readability easiness, etc. For text classification by topic, a well-known method is Singular Value Decomposition. For text classification by readability, "Flesch Reading Ease index" calculates the readability easiness level of a document (e.g. easy, medium, advanced). In this paper, we propose Singular Value Decomposition combined either with Cosine Similarity or with Aggregated Similarity Matrices to categorize documents by readability easiness and by topic. We experimentally compare both methods with Flesch Reading Ease index, and the vector-based cosine similarity method on a synthetic and a real data set (Reuters-21578). Both methods clearly outperform all other comparison partners.