Extended models and tools for high-performance part-of-speech tagger

Authors:
Masayuki Asahara;Yuji Matsumoto
Affiliations:
Nara Institute of Science and Technology, Nara, Japan;Nara Institute of Science and Technology, Nara, Japan
Venue:
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Year:
2000

Citing 5
Cited 30

Learning probabilistic automata with variable memory length

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Statistical methods for speech recognition

Statistical methods for speech recognition
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Mistake-driven mixture of hierarchical tag context trees

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics

The multilingual named entity recognition framework

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
Analysis of titles and readers: for title generation centered on the readers

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Integrating information extraction and automatic hyperlinking

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Urdu and the Parallel Grammar project

COLING '02 Proceedings of the 3rd workshop on Asian language resources and international standardization - Volume 12
Efficient deep processing of Japanese

COLING '02 Proceedings of the 3rd workshop on Asian language resources and international standardization - Volume 12
The grammar matrix: an open-source starter-kit for the rapid development of cross-linguistically consistent broad-coverage precision grammars

COLING-GEE '02 Proceedings of the 2002 workshop on Grammar engineering and evaluation - Volume 15
Parallel distributed grammar engineering for practical applications

COLING-GEE '02 Proceedings of the 2002 workshop on Grammar engineering and evaluation - Volume 15
An evaluation system for news video streams and blogs

Proceedings of the 2006 ACM symposium on Applied computing
Chinese and Japanese word segmentation using word-level and character-level information

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Efficient sentence retrieval based on syntactic structure

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Online acquisition of Japanese unknown morphemes using morphological constraints

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Opinion classification with tree kernel SVM using linguistic modality analysis

Proceedings of the 18th ACM conference on Information and knowledge management
Expressing individuality through teleoperated android: a case study with children

HCI '08 Proceedings of the Third IASTED International Conference on Human Computer Interaction
Multilingual communication support using the language grid

IWIC'07 Proceedings of the 1st international conference on Intercultural collaboration
Statistical transformation of language and pronunciation models for spontaneous speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
Semantic classification of automatically acquired nouns using lexico-syntactic clues

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Pointwise prediction for robust, adaptable Japanese morphological analysis

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Implementing the syntax of japanese numeral classifiers

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Retrieving system of presentation contents based on user's operations and semantic contexts

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
Support for internet-based commonsense processing – causal knowledge discovery using japanese “if” forms

KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
Non-parametric bayesian segmentation of Japanese noun phrases

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Automatic term extraction based on perplexity of compound words

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Instance-based generation for interactive restricted domain question answering systems

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
An error measure for japanese morphological analysis using similarity measures

KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I
Design and compilation of syntactically tagged corpus of japanese statutory sentences

JSAI-isAI'10 Proceedings of the 2010 international conference on New Frontiers in Artificial Intelligence
Distributed speech translation technologies for multiparty multilingual communication

ACM Transactions on Speech and Language Processing (TSLP)
A history-based matching approach to identification of framework evolution

Proceedings of the 34th International Conference on Software Engineering
Identifying event sequences using hidden Markov model

NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
A-STAR: Toward translating Asian spoken languages

Computer Speech and Language
Building a bilingual dictionary from a Japanese-Chinese patent corpus

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Statistical part-of-speech (POS) taggers achieve high accuracy and robustness when based on large scale manually tagged corpora. However, enhancements of the learning models are necessary to achieve better performance. We are developing a learning tool for a Japanese morphological analyzer called ChaSen. Currently we use a fine-grained POS tag set with about 500 tags. To apply a normal tri gram model on the tag set, we need unrealistic size of corpora. Even, for a bi-gram model, we cannot prepare a moderate size of an annotated corpus, when we take all the tags as distinct. A usual technique to cope with such fine-grained tags is to reduce the size of the tag set by grouping the set of tags into equivalence classes. We introduce the concept of position-wise grouping where the tag set is partitioned into different equivalence classes at each position in the conditional probabilities in the Markov Model. Moreover, to cope with the data sparseness problem caused by exceptional phenomena, we introduce several other techniques such as word-level statistics, smoothing of word-level and POS-level statistics and a selective tri-gram model. To help users determine probabilistic parameters, we introduce an error-driven method for the parameter selection. We then give results of experiments to see the effect of the tools applied to an existing Japanese morphological analyzer.