Flexible document categorisation

  • Authors:
  • Jebari Chaker;Ounalli Habib

  • Affiliations:
  • Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh, Kingdom of Saudi Arabia;Département d'Informatique, Faculté des Sciences de Tunis, Université de Tunis El'Manar, Tunis, Tunisie

  • Venue:
  • AIKED'05 Proceedings of the 4th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering Data Bases
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the context of automatic document categorization, we propose in this paper a new flexible approach for electronic document categorization situated in junction of knowledge engineering and learning machine approaches. Our approach assigns a HTML document to one or more categories (paper, call for papers, email,..) using three types of criterions: physical, logical and discursival criterions. Using a set of pre-categorised document, this approach generates a base of categorization rules. This base is used to categorise new documents. The categorization flexibility is carried out with rule weight association representing your importance in the discrimination between possible categories. This weight is calculated using the Zadeh min t-norm and it's dynamically modified at each new categorization. The proposed approach is experimented using a corpus of 615 HTML documents belonging to different predefined categories. The obtained results are satisfactory and make up a primary validation for our approach.