Automatic term extraction using log-likelihood based comparison with general reference corpus

Authors:
Alexander Gelbukh;Grigori Sidorov;Eduardo Lavin-Villa;Liliana Chanona-Hernandez
Affiliations:
Center for Computing Research, National Polytechnic Institute, Mexico, DF, Mexico;Center for Computing Research, National Polytechnic Institute, Mexico, DF, Mexico;Center for Computing Research, National Polytechnic Institute, Mexico, DF, Mexico;Engineering Faculty, National Polytechnic Institute, Mexico, DF, Mexico
Venue:
NLDB'10 Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems
Year:
2010

Citing 4
Cited 0

Ontological Engineering

Ontological Engineering
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications

Ontology Learning and Population from Text: Algorithms, Evaluation and Applications
Knowledge-based methods for automatic extraction of domain-specific ontologies

Knowledge-based methods for automatic extraction of domain-specific ontologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the paper we present a method that allows an extraction of single-word terms for a specific domain. At the next stage these terms can be used as candidates for multi-word term extraction. The proposed method is based on comparison with general reference corpus using log-likelihood similarity. We also perform clustering of the extracted terms using k-means algorithm and cosine similarity measure. We made experiments using texts of the domain of computer science. The obtained term list is analyzed in detail.