WORDS AS CLASSIFIERS OF DOCUMENTS ACCORDING TO THEIR HISTORICAL PERIOD AND THE ETHNIC ORIGIN OF THEIR AUTHORS

  • Authors:
  • Yaakov HaCohen-Kerner;Dror Mughaz;Hananya Beck;Elchai Yehudai

  • Affiliations:
  • Department of Computer Science, Jerusalem College of Technology (Machon Lev), Jerusalem, Israel;Department of Computer Science, Jerusalem College of Technology (Machon Lev), Jerusalem, Israel,Department of Computer Science, Bar-Ilan University, Ramat-Gan, Israel;Department of Computer Science, Jerusalem College of Technology (Machon Lev), Jerusalem, Israel;Department of Computer Science, Jerusalem College of Technology (Machon Lev), Jerusalem, Israel

  • Venue:
  • Cybernetics and Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text classification presents challenges due to the large number of features, their dependencies, and the large number of training documents. In this research, we investigate whether the use of words as features is appropriate for classification of documents to the ethnic group of their authors and/or to the historical period when they were written. To the best of our knowledge, these kinds of classifications have not been explored before by others. In addition, we investigate Forman's (2003) claim about not using common words for classification tasks. The application domain was articles referring to Jewish law written in Hebrew-Aramaic, which have been little studied. Different experiments using SVM and InfoGain present highly successful results (more than 95%). The results indicate that the use of common words as features contribute to make the learning task efficient and more accurate.