Online acquisition of Japanese unknown morphemes using morphological constraints

Authors:
Yugo Murawaki;Sadao Kurohashi
Affiliations:
Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto, Japan;Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto, Japan
Venue:
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2008

Citing 7
Cited 3

Extended models and tools for high-performance part-of-speech tagger

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Word extraction from corpora and its part-of-speech estimation using distributional analysis

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
A part of speech estimation method for Japanese unknown words using a statistical model of morphology and context

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
A simple but powerful automatic term extraction method

COMPUTERM '02 COLING-02 on COMPUTERM 2002: second international workshop on computational terminology - Volume 14
Guessing parts-of-speech of unknown words using global information

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Japanese unknown word identification by character-based chunking

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Chinese segmentation and new word detection using conditional random fields

COLING '04 Proceedings of the 20th international conference on Computational Linguistics

Unsupervised Text Normalization Approach for Morphological Analysis of Blog Documents

AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
Semantic classification of automatically acquired nouns using lexico-syntactic clues

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Non-parametric bayesian segmentation of Japanese noun phrases

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a novel lexicon acquirer that works in concert with the morphological analyzer and has the ability to run in online mode. Every time a sentence is analyzed, it detects unknown morphemes, enumerates candidates and selects the best candidates by comparing multiple examples kept in the storage. When a morpheme is unambiguously selected, the lexicon acquirer updates the dictionary of the analyzer, and it will be used in subsequent analysis. We use the constraints of Japanese morphology and effectively reduce the number of examples required to acquire a morpheme. Experiments show that unknown morphemes were acquired with high accuracy and improved the quality of morphological analysis.