Multilingual term extraction from domain-specific corpora using morphological structure

  • Authors:
  • Delphine Bernhard

  • Affiliations:
  • TIMC-IMAG Institut de l'Ingénierie et de l'Information de Santé, LA TRONCHE cedex

  • Venue:
  • EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Morphologically complex terms composed from Greek or Latin elements are frequent in scientific and technical texts. Word forming units are thus relevant cues for the identification of terms in domain-specific texts. This article describes a method for the automatic extraction of terms relying on the detection of classical prefixes and word-initial combining forms. Word-forming units are identified using a regular expression. The system then extracts terms by selecting words which either begin or coalesce with these elements. Next, terms are grouped in families which are displayed as a weighted list in HTML format.