Symbolic word clustering for medium-size corpora

  • Authors:
  • Benoít Habert;Elie Naulleau;Adeline Nazarenko

  • Affiliations:
  • Equipe de Linguistique Informatique, Ecole Normale Supérieure de Fontenay-St Cloud, Fontenay-aux-Roses;Direction des Etudes et Recherches - Electricité de France, Clamart;Equipe de Linguistique Informatique, Ecole Normale Supérieure de Fontenay-St Cloud, Fontenay-aux-Roses

  • Venue:
  • COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
  • Year:
  • 1996

Quantified Score

Hi-index 0.02

Visualization

Abstract

When trying to identify essential concepts and relationships in a medium-size corpus, it is not always possible to rely on statistical methods, as the frequencies are too low. We present an alternative method, symbolic, based on the simplification of parse trees. We discuss the results on nominal phrases of two technical corpora, analyzed by two different robust parsers used for terminology updating in an industrial company. We compare our results with Hindle's scores of similarity.