HUMB: Automatic key term extraction from scientific articles in GROBID

  • Authors:
  • Patrice Lopez;Laurent Romary

  • Affiliations:
  • INRIA, Berlin, Germany;INRIA & HUB-IDSL, Berlin, Germany

  • Venue:
  • SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Semeval task 5 was an opportunity for experimenting with the key term extraction module of GROBID, a system for extracting and generating bibliographical information from technical and scientific documents. The tool first uses GROBID's facilities for analyzing the structure of scientific articles, resulting in a first set of structural features. A second set of features captures content properties based on phraseness, informativeness and keywordness measures. Two knowledge bases, GRISP and Wikipedia, are then exploited for producing a last set of lexical/semantic features. Bagged decision trees appeared to be the most efficient machine learning algorithm for generating a list of ranked key term candidates. Finally a post ranking was realized based on statistics of cousage of keywords in HAL, a large Open Access publication repository.