Automatic term extraction based on perplexity of compound words

Authors:
Minoru Yoshida;Hiroshi Nakagawa
Affiliations:
Information Technology Center, University of Tokyo, Tokyo;Information Technology Center, University of Tokyo, Tokyo
Venue:
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Year:
2005

Citing 3
Cited 2

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Extended models and tools for high-performance part-of-speech tagger

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
A methodology for automatic term recognition

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2

Automatic extraction of bilingual terms from a Chinese-Japanese parallel corpus

Proceedings of the 3rd International Universal Communication Symposium
Care more about customers: Unsupervised domain-independent aspect detection for sentiment analysis of customer reviews

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many methods of term extraction have been discussed in terms of their accuracy on huge corpora. However, when we try to apply various methods that derive from frequency to a small corpus, we may not be able to achieve sufficient accuracy because of the shortage of statistical information on frequency. This paper reports a new way of extracting terms that is tuned for a very small corpus. It focuses on the structure of compound terms and calculates perplexity on the term unit's left-side and right-side. The results of our experiments revealed that the accuracy with the proposed method was not that advantageous. However, experimentation with the method combining perplexity and frequency information obtained the highest average-precision in comparison with other methods.