Indexation de textes: l'apprentissage des concepts

Authors:
C. Enguehard;P. Malvache;P. Trigano
Affiliations:
Université de Technologie de Compiègne, Compiègne, France;Commissariat à L'Energie Atomique, Centre d'Etudes de Cadarache, Saint-Paul-Lez-Durance, France;Université de Technologie de Compiègne, Compiègne, France
Venue:
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 4
Year:
1992

Citing 4
Cited 2

Another look at automatic text-retrieval systems

Communications of the ACM
Information retrieval using a singular value decomposition model of latent semantic structure

SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Construction of a dynamic Thesaurus and its use for associated information retrieval

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
The Association Factor in Information Retrieval

Journal of the ACM (JACM)

Retrieving terms and their variants in a lexicalized unification-based framework

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Improving term extraction with terminological resources

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In technical fields, many documents go unread due to a lack of awareness of their existence. A system which indexes texts can find all relevant texts in response to a query. The problem is to establish the indexation. At present, advanced full text systems automatically index texts on the complete thesaurus with computed weights. Another way of doing this can be a person choosing the set of relevant concepts. This second solution is better but more costly and dependent on the classification choices made by the operator.To meet these problems, ANA (Auomatic Natural Acquisition) had been developed. This system automatically extracts relevant concepts from free texts to produce a semantic network. It does not rely on grammar or lexicon but, instead, is based on an original statistical method.This research brings about two developments: on one hand the system is also capable of extracting the simple grammatical structures it encounters, most often in order to improve its performance, and on the other hand this will lead to an automatic definition of semantic classes of concepts, in order to structure the network.