C4.5: programs for machine learning
C4.5: programs for machine learning
Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
The nature of statistical learning theory
The nature of statistical learning theory
Cluster-based text categorization: a comparison of category search strategies
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning
Automatic Indexing: An Experimental Inquiry
Journal of the ACM (JACM)
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Machine Learning
CONSTRUE/TIS: A System for Content-Based Indexing of a Database of News Stories
IAAI '90 Proceedings of the The Second Conference on Innovative Applications of Artificial Intelligence
Genre based Navigation on the Web
HICSS '01 Proceedings of the 34th Annual Hawaii International Conference on System Sciences ( HICSS-34)-Volume 4 - Volume 4
Reproduced and emergent genres of communication on the World-Wide Web
HICSS '97 Proceedings of the 30th Hawaii International Conference on System Sciences: Digital Documents - Volume 6
Automatic Identification of Text Genres and Their Roles in Subject-Based Categorization
HICSS '04 Proceedings of the Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS'04) - Track 4 - Volume 4
Automatic authorship attribution
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Automatic detection of text genre
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Recognizing text genres with simple metrics using discriminant analysis
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Feature selection and feature extraction for text categorization
HLT '91 Proceedings of the workshop on Speech and Natural Language
Hi-index | 0.00 |
In the context of automatic document categorization, we propose in this paper a new flexible approach for electronic document categorization situated in junction of knowledge engineering and learning machine approaches. Our approach assigns a HTML document to one or more categories (paper, call for papers, email,..) using three types of criterions: physical, logical and discursival criterions. Using a set of pre-categorised document, this approach generates a base of categorization rules. This base is used to categorise new documents. The categorization flexibility is carried out with rule weight association representing your importance in the discrimination between possible categories. This weight is calculated using the Zadeh min t-norm and it's dynamically modified at each new categorization. The proposed approach is experimented using a corpus of 615 HTML documents belonging to different predefined categories. The obtained results are satisfactory and make up a primary validation for our approach.