Abbreviation generation for Japanese multi-word expressions

Authors:
Hiromi Wakaki;Hiroko Fujii;Masaru Suzuki;Mika Fukui;Kazuo Sumita
Affiliations:
Toshiba Corporation, Saiwai-ku, Kawasaki, Japan;Toshiba Corporation, Saiwai-ku, Kawasaki, Japan;Toshiba Corporation, Saiwai-ku, Kawasaki, Japan;Toshiba Corporation, Saiwai-ku, Kawasaki, Japan;Toshiba Corporation, Saiwai-ku, Kawasaki, Japan
Venue:
MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
Year:
2009

Citing 6
Cited 1

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
SaRAD: a Simple and Robust Abbreviation Dictionary

Bioinformatics
Building an abbreviation dictionary using a term recognition approach

Bioinformatics
A discriminative alignment model for abbreviation recognition

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
A supervised learning approach to acronym identification

AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence

Learning Abbreviations from Chinese and English Terms by Modeling Non-Local Information

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a novel method for generating Japanese abbreviations from their full forms with the Log-Linear Model (LLM) in order to take advantage of characteristic patterns of Japanese abbreviation. Our experimental results show that the method is effective for TV program titles that contain colloquial expressions. The proposed method achieved 78.8% recall for the top 30 candidates, whereas a baseline method using Conditional Random Fields (CRFs) achieved 68.3% recall. Moreover, from the results of experiments using six data sets classified according to types of character and semantic categories, we show that each performance of the above two methods depends on the types of the full forms.