Automatic document classification: natural language processing, statistical analysis, and expert system techniques used together

  • Authors:
  • M. J. Blosseville;G. Hébrail;M. G. Monteil;N. Pénot

  • Affiliations:
  • Electricite de France, Clamart, France;Electricite de France, Clamart, France;Electricite de France, Clamart, France;Electricite de France, Clamart, France

  • Venue:
  • SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we describe an automated method of classifying research project descriptions: a human expert classifies a sample set of projects into a set of disjoint and pre-defined classes, and then the computer learns from this sample how to classify new projects into these classes. Both textual and non-textual information associated with the projects are used in the learning and classification phases. Textual information is processed by two methods of analysis: a natural language analysis followed by a statistical analysis. Non-textual information is processed by a symbolic learning technique. We present the results of some experiments done on real data: two different classifications of our research projects.