Fragments and text categorization

Authors:
Jan Blaták;Eva Mráková;Luboš Popelínský
Affiliations:
Masaryk University, Czech Republic;Masaryk University, Czech Republic;Masaryk University, Czech Republic
Venue:
ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Year:
2004

Citing 4
Cited 0

A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A patent search and classification system

Proceedings of the fourth ACM conference on Digital libraries
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Choose Your Words Carefully: An Empirical Study of Feature Selection Metrics for Text Classification

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce two novel methods of text categorization in which documents are split into fragments. We conducted experiments on English, French and Czech. In all cases, the problems referred to a binary document classification. We find that both methods increase the accuracy of text categorization. For the Naïve Bayes classifier this increase is significant.