A semantic case-based reasoning framework for text categorization

  • Authors:
  • Valentina Ceausu;Sylvie Desprès

  • Affiliations:
  • CRIP, University of Paris 5, Paris, France;LIPN, UMR, CNRS, University of Paris 13, Villetaneuse, France

  • Venue:
  • ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
  • Year:
  • 2007

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper presents a semantic case-based reasoning framework for text categorization. Text categorization is the task of classifying text documents under predefined categories. Accidentology is our application field and the goal of our framework is to classify documents describing real road accidents under predefined road accident prototypes, which also are described by text documents. Accidents are described by accident reports while accident prototypes are described by accident scenarios. Thus, text categorization is done by assigning each accident report to an accident scenario, which highlights particular mechanisms leading to accident. We propose a textual case-based reasoning approach (TCBR), which allows us to integrate both textual and domain knowledge aspects in order to carry out this categorization. CBR solves a new problem (target case) by identifying its similarity to one or several previously solved problems (source cases) stored in a case base and by adapting their known solutions. Cases of our framework are created from text. Most of TCBR applications create cases from text by using Information Retrieval techniques, which leads to knowledge-poor descriptions of cases.We show that using semantic resources (two ontologies of accidentology) makes possible to overcome this difficulty, and allows us to enrich cases by using formal knowledge. In this paper, we argue that semantic resources are likely to improve the quality of cases created from text, and, therefore, such resources can support the reasoning cycle. We illustrate this claim with our framework developed to classify documents in the accidentology domain.