The nature of statistical learning theory
The nature of statistical learning theory
Machine Learning
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Learning to remove Internet advertisements
Proceedings of the third annual conference on Autonomous Agents
Extending naïve Bayes classifiers using long itemsets
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Communications of the ACM
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Mining e-mail content for author identification forensics
ACM SIGMOD Record
SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology
Genre Classification and Domain Transfer for Information Filtering
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Authorship Attribution with Support Vector Machines
Applied Intelligence
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
The disputed federalist papers: SVM feature selection via concave minimization
Proceedings of the 2003 conference on Diversity in computing
Hebrew Computational Linguistics: Past and Future
Artificial Intelligence Review
Automatic detection of text genre
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Proceedings of the 13th international conference on World Wide Web
Recognizing text genres with simple metrics using discriminant analysis
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
An evaluation of statistical spam filtering techniques
ACM Transactions on Asian Language Information Processing (TALIP)
Journal of the American Society for Information Science and Technology
Language and task independent text categorization with simple language models
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Thumbs up?: sentiment classification using machine learning techniques
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Stylistic text classification using functional lexical features: Research Articles
Journal of the American Society for Information Science and Technology
Author identification: Using text sampling to handle the class imbalance problem
Information Processing and Management: an International Journal
NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
Combined one sense disambiguation of abbreviations
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
The COMPSET algorithm for subset selection
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Multiple sets of features for automatic genre classification of web documents
Information Processing and Management: an International Journal
Identifying historical period and ethnic origin of documents using stylistic feature sets
DS'06 Proceedings of the 9th international conference on Discovery Science
Estimating the birth and death years of authors of undated documents using undated citations
IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Hi-index | 0.00 |
Document classification presents challenges due to the large number of features, their dependencies, and the large number of training documents. In this research, we investigated the use of six stylistic feature sets (including 42 features) and-or six name-based feature sets (including 234 features) for various combinations of the following classification tasks: ethnic groups of the authors and-or periods of time when the documents were written and-or places where the documents were written. The investigated corpus contains Jewish Law articles written in Hebrew–Aramaic, which present interesting problems for classification. Our system CUISINE (Classification UsIng Stylistic feature sets and-or NamE-based feature sets) achieves accuracy results between 90.71 to 98.99% for the seven classification experiments (ethnicity, time, place, ethnicity&time, ethnicity&place, time&place, ethnicity&time&place). For the first six tasks, the stylistic feature sets in general and the quantitative feature set in particular are enough for excellent classification results. In contrast, the name-based feature sets are rather poor for these tasks. However, for the most complex task (ethnicity&time&place), a hill-climbing model using all feature sets succeeds in significantly improving the classification results. Most of the stylistic features (34 of 42) are language-independent and domain-independent. These features might be useful to the community at large, at least for rather simple tasks. © 2010 Wiley Periodicals, Inc.