Improving portuguese term extraction

Authors:
Lucelene Lopes;Renata Vieira
Affiliations:
Faculdade de Informática,FACIN, PUCRS, Porto Alegre, RS, Brazil;Faculdade de Informática,FACIN, PUCRS, Porto Alegre, RS, Brazil
Venue:
PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Year:
2012

Citing 4
Cited 0

Information Retrieval

Information Retrieval
Advancing Topic Ontology Learning through Term Extraction

PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
The design, implementation, and use of the Ngram statistics package

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Out-of-the-box robust parsing of Portuguese

PROPOR'10 Proceedings of the 9th international conference on Computational Processing of the Portuguese Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the evaluation of a set of heuristics to improve the quality of extracted terms from an annotated domain corpus written in Portuguese. The proposed heuristics start from part-of-speech and grammatical functional annotation of texts, identifying nouns and noun phrases that are the best candidates to be considered terms of the domain. These nouns and noun phrases are submitted to a set of approximative rules (heuristics) that may either discard some, accept others (removing words or not), or even discover implicit terms that can be inferred. The effectiveness of these heuristics is verified through a corpus experiment, on the basis of a reference list for which usual metrics are computed.