Learning probabilistic automata with variable memory length
COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Statistical methods for speech recognition
Statistical methods for speech recognition
A practical part-of-speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Mistake-driven mixture of hierarchical tag context trees
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
The multilingual named entity recognition framework
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
Analysis of titles and readers: for title generation centered on the readers
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Integrating information extraction and automatic hyperlinking
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Urdu and the Parallel Grammar project
COLING '02 Proceedings of the 3rd workshop on Asian language resources and international standardization - Volume 12
Efficient deep processing of Japanese
COLING '02 Proceedings of the 3rd workshop on Asian language resources and international standardization - Volume 12
COLING-GEE '02 Proceedings of the 2002 workshop on Grammar engineering and evaluation - Volume 15
Parallel distributed grammar engineering for practical applications
COLING-GEE '02 Proceedings of the 2002 workshop on Grammar engineering and evaluation - Volume 15
An evaluation system for news video streams and blogs
Proceedings of the 2006 ACM symposium on Applied computing
Chinese and Japanese word segmentation using word-level and character-level information
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Efficient sentence retrieval based on syntactic structure
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Online acquisition of Japanese unknown morphemes using morphological constraints
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Opinion classification with tree kernel SVM using linguistic modality analysis
Proceedings of the 18th ACM conference on Information and knowledge management
Expressing individuality through teleoperated android: a case study with children
HCI '08 Proceedings of the Third IASTED International Conference on Human Computer Interaction
Multilingual communication support using the language grid
IWIC'07 Proceedings of the 1st international conference on Intercultural collaboration
Statistical transformation of language and pronunciation models for spontaneous speech recognition
IEEE Transactions on Audio, Speech, and Language Processing
Semantic classification of automatically acquired nouns using lexico-syntactic clues
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Pointwise prediction for robust, adaptable Japanese morphological analysis
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Implementing the syntax of japanese numeral classifiers
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Retrieving system of presentation contents based on user's operations and semantic contexts
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
Non-parametric bayesian segmentation of Japanese noun phrases
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Automatic term extraction based on perplexity of compound words
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Instance-based generation for interactive restricted domain question answering systems
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
An error measure for japanese morphological analysis using similarity measures
KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I
Design and compilation of syntactically tagged corpus of japanese statutory sentences
JSAI-isAI'10 Proceedings of the 2010 international conference on New Frontiers in Artificial Intelligence
Distributed speech translation technologies for multiparty multilingual communication
ACM Transactions on Speech and Language Processing (TSLP)
A history-based matching approach to identification of framework evolution
Proceedings of the 34th International Conference on Software Engineering
Identifying event sequences using hidden Markov model
NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
A-STAR: Toward translating Asian spoken languages
Computer Speech and Language
Building a bilingual dictionary from a Japanese-Chinese patent corpus
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Hi-index | 0.00 |
Statistical part-of-speech (POS) taggers achieve high accuracy and robustness when based on large scale manually tagged corpora. However, enhancements of the learning models are necessary to achieve better performance. We are developing a learning tool for a Japanese morphological analyzer called ChaSen. Currently we use a fine-grained POS tag set with about 500 tags. To apply a normal tri gram model on the tag set, we need unrealistic size of corpora. Even, for a bi-gram model, we cannot prepare a moderate size of an annotated corpus, when we take all the tags as distinct. A usual technique to cope with such fine-grained tags is to reduce the size of the tag set by grouping the set of tags into equivalence classes. We introduce the concept of position-wise grouping where the tag set is partitioned into different equivalence classes at each position in the conditional probabilities in the Markov Model. Moreover, to cope with the data sparseness problem caused by exceptional phenomena, we introduce several other techniques such as word-level statistics, smoothing of word-level and POS-level statistics and a selective tri-gram model. To help users determine probabilistic parameters, we introduce an error-driven method for the parameter selection. We then give results of experiments to see the effect of the tools applied to an existing Japanese morphological analyzer.