Automatic term extraction using log-likelihood based comparison with general reference corpus

  • Authors:
  • Alexander Gelbukh;Grigori Sidorov;Eduardo Lavin-Villa;Liliana Chanona-Hernandez

  • Affiliations:
  • Center for Computing Research, National Polytechnic Institute, Mexico, DF, Mexico;Center for Computing Research, National Polytechnic Institute, Mexico, DF, Mexico;Center for Computing Research, National Polytechnic Institute, Mexico, DF, Mexico;Engineering Faculty, National Polytechnic Institute, Mexico, DF, Mexico

  • Venue:
  • NLDB'10 Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the paper we present a method that allows an extraction of single-word terms for a specific domain. At the next stage these terms can be used as candidates for multi-word term extraction. The proposed method is based on comparison with general reference corpus using log-likelihood similarity. We also perform clustering of the extracted terms using k-means algorithm and cosine similarity measure. We made experiments using texts of the domain of computer science. The obtained term list is analyzed in detail.