An Improved Automatic Term Recognition Method for Spanish

  • Authors:
  • Alberto Barrón-Cedeño;Gerardo Sierra;Patrick Drouin;Sophia Ananiadou

  • Affiliations:
  • Engineering Institute, Universidad Nacional Autónoma de México, Mexico and Department of Information Systems and Computation, Universidad Politécnica de Valencia, Spain;Engineering Institute, Universidad Nacional Autónoma de México, Mexico;Observatoire de Linguistique Sense-Texte, Université de Montréal, Canada;University of Manchester and National Centre for Text Mining, UK

  • Venue:
  • CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2009
  • NLTK: the Natural Language Toolkit

    ETMTNLP '02 Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics - Volume 1

Quantified Score

Hi-index 0.00

Visualization

Abstract

The $C\mbox{-}value/NC\mbox{-}value$ algorithm, a hybrid approach to automatic term recognition, has been originally developed to extract multiword term candidates from specialised documents written in English. Here, we present three main modifications to this algorithm that affect how the obtained output is refined. The first modification aims to maximise the number of real terms in the list of candidates with a new approach for the stop-list application process. The second modification adapts the $C\mbox{-}value$ calculation formula in order to consider single word terms. The third modification changes how the term candidates are grouped, exploiting a lemmatised version of the input corpus. Additionally, size of candidate's context window is variable. We also show the necessary linguistic modifications to apply this algorithm to the recognition of term candidates in Spanish.