Automatic document classification: natural language processing, statistical analysis, and expert system techniques used together

Authors:
M. J. Blosseville;G. Hébrail;M. G. Monteil;N. Pénot
Affiliations:
Electricite de France, Clamart, France;Electricite de France, Clamart, France;Electricite de France, Clamart, France;Electricite de France, Clamart, France
Venue:
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
1992

Citing 3
Cited 6

Machine learning an artificial intelligence approach volume II

Machine learning an artificial intelligence approach volume II
An expert system for quality control in bibliographic databases

Journal of the American Society for Information Science
Automatic Document Classification Part II . Additional Experiments

Journal of the ACM (JACM)

Text categorization for multiple users based on semantic features from a machine-readable dictionary

ACM Transactions on Information Systems (TOIS)
Information filtering: a tool for communication between researchers

CHI '93 INTERACT '93 and CHI '93 Conference Companion on Human Factors in Computing Systems
Knowledge extraction from texts: a method for extracting predicate-argument structures from texts

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Document classification using domain specific kanji characters extracted by X2 method

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
GANNET: a machine learning approach to document retrieval

Journal of Management Information Systems - Special section: Information technology and IT organizational impact
A metadata calculus for secure information sharing

Proceedings of the 16th ACM conference on Computer and communications security

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we describe an automated method of classifying research project descriptions: a human expert classifies a sample set of projects into a set of disjoint and pre-defined classes, and then the computer learns from this sample how to classify new projects into these classes. Both textual and non-textual information associated with the projects are used in the learning and classification phases. Textual information is processed by two methods of analysis: a natural language analysis followed by a statistical analysis. Non-textual information is processed by a symbolic learning technique. We present the results of some experiments done on real data: two different classifications of our research projects.