WORDS AS CLASSIFIERS OF DOCUMENTS ACCORDING TO THEIR HISTORICAL PERIOD AND THE ETHNIC ORIGIN OF THEIR AUTHORS

Authors:
Yaakov HaCohen-Kerner;Dror Mughaz;Hananya Beck;Elchai Yehudai
Affiliations:
Department of Computer Science, Jerusalem College of Technology (Machon Lev), Jerusalem, Israel;Department of Computer Science, Jerusalem College of Technology (Machon Lev), Jerusalem, Israel,Department of Computer Science, Bar-Ilan University, Ramat-Gan, Israel;Department of Computer Science, Jerusalem College of Technology (Machon Lev), Jerusalem, Israel;Department of Computer Science, Jerusalem College of Technology (Machon Lev), Jerusalem, Israel
Venue:
Cybernetics and Systems
Year:
2008

Citing 26
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
The nature of statistical learning theory

The nature of statistical learning theory
Little words can make a big difference for text classification

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Support-Vector Networks

Machine Learning
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Feature selection, perceptron learning, and a usability case study for text categorization

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Mining online text

Communications of the ACM
Automatic Indexing: An Experimental Inquiry

Journal of the ACM (JACM)
Text filtering by boosting naive Bayes classifiers

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
A Study of Approaches to Hypertext Categorization

Journal of Intelligent Information Systems
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Automatic Text Categorization: Case Study

SBRN '02 Proceedings of the VII Brazilian Symposium on Neural Networks (SBRN'02)
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Using the feature projection technique based on a normalized voting method for text classification

Information Processing and Management: an International Journal
Text categorization with many redundant features: using aggressive feature selection to make SVMs competitive with C4.5

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Improving performance of text categorization by combining filtering and support vector machines: Research Articles

Journal of the American Society for Information Science and Technology
Competitive generative models with structure learning for NLP classification tasks

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Weighted proportional k-interval discretization for naive-Bayes classifiers

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Techniques for improving the performance of naive bayes for text classification

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text classification presents challenges due to the large number of features, their dependencies, and the large number of training documents. In this research, we investigate whether the use of words as features is appropriate for classification of documents to the ethnic group of their authors and/or to the historical period when they were written. To the best of our knowledge, these kinds of classifications have not been explored before by others. In addition, we investigate Forman's (2003) claim about not using common words for classification tasks. The application domain was articles referring to Jewish law written in Hebrew-Aramaic, which have been little studied. Different experiments using SVM and InfoGain present highly successful results (more than 95%). The results indicate that the use of common words as features contribute to make the learning task efficient and more accurate.