Indexation de textes: l'apprentissage des concepts

  • Authors:
  • C. Enguehard;P. Malvache;P. Trigano

  • Affiliations:
  • Université de Technologie de Compiègne, Compiègne, France;Commissariat à L'Energie Atomique, Centre d'Etudes de Cadarache, Saint-Paul-Lez-Durance, France;Université de Technologie de Compiègne, Compiègne, France

  • Venue:
  • COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 4
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

In technical fields, many documents go unread due to a lack of awareness of their existence. A system which indexes texts can find all relevant texts in response to a query. The problem is to establish the indexation. At present, advanced full text systems automatically index texts on the complete thesaurus with computed weights. Another way of doing this can be a person choosing the set of relevant concepts. This second solution is better but more costly and dependent on the classification choices made by the operator.To meet these problems, ANA (Auomatic Natural Acquisition) had been developed. This system automatically extracts relevant concepts from free texts to produce a semantic network. It does not rely on grammar or lexicon but, instead, is based on an original statistical method.This research brings about two developments: on one hand the system is also capable of extracting the simple grammatical structures it encounters, most often in order to improve its performance, and on the other hand this will lead to an automatic definition of semantic classes of concepts, in order to structure the network.