Multilingual term extraction from domain-specific corpora using morphological structure

Authors:
Delphine Bernhard
Affiliations:
TIMC-IMAG Institut de l'Ingénierie et de l'Information de Santé, LA TRONCHE cedex
Venue:
EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
Year:
2006

Citing 4
Cited 3

Terminological variation, a means of identifying research topics from texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Biomedical text retrieval in languages with a complex morphology

BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Comparing corpora using frequency profiling

CompareCorpora '00 Proceedings of the Workshop on Comparing Corpora
Induction of a simple morphology for highly-inflecting languages

SIGMorPhon '04 Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology

OntoMethodus: a methodology to build domain-specific ontologies and its use in a system to support the generation of terminographic products

Companion Proceedings of the XIV Brazilian Symposium on Multimedia and the Web
The TermiNet project: an overview

YIWCALA '10 Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas
Developing multilingual text mining workflows in UIMA and u-compare

NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Morphologically complex terms composed from Greek or Latin elements are frequent in scientific and technical texts. Word forming units are thus relevant cues for the identification of terms in domain-specific texts. This article describes a method for the automatic extraction of terms relying on the detection of classical prefixes and word-initial combining forms. Word-forming units are identified using a regular expression. The system then extracts terms by selecting words which either begin or coalesce with these elements. Next, terms are grouped in families which are displayed as a weighted list in HTML format.