A study of local and global thresholding techniques in text categorization
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
A new feature selection algorithm based on binomial hypothesis testing for spam filtering
Knowledge-Based Systems
Phoneme Based Representation for Vietnamese Web Page Classification
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
An empirical study on various text classifiers
Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology
The impact of preprocessing on text classification
Information Processing and Management: an International Journal
Hi-index | 0.00 |
It is well known that the classification effectiveness of the text categorization system is not simply a matter of learning algorithms. Text representation factors are also at work. This paper will consider the ways in which the effectiveness of text classifiers is linked to the five text representation factors: “stop words removal”, “word stemming”, “indexing”, “weighting”, and “normalization”. Statistical analyses of experimental results show that performing “normalization” can always promote effectiveness of text classifiers significantly. The effects of the other factors are not as great as expected. Contradictory to common sense, a simple binary indexing method can sometimes be helpful for text categorization.