An Improved Automatic Term Recognition Method for Spanish

Authors:
Alberto Barrón-Cedeño;Gerardo Sierra;Patrick Drouin;Sophia Ananiadou
Affiliations:
Engineering Institute, Universidad Nacional Autónoma de México, Mexico and Department of Information Systems and Computation, Universidad Politécnica de Valencia, Spain;Engineering Institute, Universidad Nacional Autónoma de México, Mexico;Observatoire de Linguistique Sense-Texte, Université de Montréal, Canada;University of Manchester and National Centre for Text Mining, UK
Venue:
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Year:
2009

Citing 1
Cited 1

NLTK: the Natural Language Toolkit

ETMTNLP '02 Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics - Volume 1

The REG summarization system with question reformulation at QA@INEX track 2010

INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The $C\mbox{-}value/NC\mbox{-}value$ algorithm, a hybrid approach to automatic term recognition, has been originally developed to extract multiword term candidates from specialised documents written in English. Here, we present three main modifications to this algorithm that affect how the obtained output is refined. The first modification aims to maximise the number of real terms in the list of candidates with a new approach for the stop-list application process. The second modification adapts the $C\mbox{-}value$ calculation formula in order to consider single word terms. The third modification changes how the term candidates are grouped, exploiting a lemmatised version of the input corpus. Additionally, size of candidate's context window is variable. We also show the necessary linguistic modifications to apply this algorithm to the recognition of term candidates in Spanish.