Text classification by aggregation of SVD eigenvectors

  • Authors:
  • Panagiotis Symeonidis;Ivaylo Kehayov;Yannis Manolopoulos

  • Affiliations:
  • Department of Informatics, Aristotle University, Thessaloniki, Greece;Department of Informatics, Aristotle University, Thessaloniki, Greece;Department of Informatics, Aristotle University, Thessaloniki, Greece

  • Venue:
  • ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text classification is a process where documents are categorized usually by topic, place, readability easiness, etc. For text classification by topic, a well-known method is Singular Value Decomposition. For text classification by readability, "Flesch Reading Ease index" calculates the readability easiness level of a document (e.g. easy, medium, advanced). In this paper, we propose Singular Value Decomposition combined either with Cosine Similarity or with Aggregated Similarity Matrices to categorize documents by readability easiness and by topic. We experimentally compare both methods with Flesch Reading Ease index, and the vector-based cosine similarity method on a synthetic and a real data set (Reuters-21578). Both methods clearly outperform all other comparison partners.