An evaluation of phrasal and clustered representations on a text categorization task
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature selection using linear classifier weights: interaction with classification models
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Light stemming approaches for the French, Portuguese, German and Hungarian languages
Proceedings of the 2006 ACM symposium on Applied computing
Language morphology offset: Text classification on a Croatian-English parallel corpus
Information Processing and Management: an International Journal
Automatic acquisition of inflectional lexica for morphological normalisation
Information Processing and Management: an International Journal
Does dictionary based bilingual retrieval work in a non-normalized index?
Information Processing and Management: an International Journal
Automatic authorship attribution for texts in croatian language using combinations of features
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II
Evaluation of normalization techniques in text classification for portuguese
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part III
Technical Section: EXOD: A tool for building and exploring a large graph of open datasets
Computers and Graphics
Hi-index | 0.00 |
In this paper we compare n-grams and morphological normalization, two inherently different text-preprocessing methods, used for text classification on a Croatian-English parallel corpus. Our approach to comparing different text preprocessing techniques is based on measuring computational performance (execution time and memory consumption), as well as classification performance. We show that although n-grams achieve classifier performance comparable to traditional word-based feature extraction and can act as a substitute for morphological normalization, they are computationally much more demanding.