A Stochastic Technique to Obtain Training Data for Word Segmentation

Authors:
Takuya Fukuda;Takao Miura
Affiliations:
-;-
Venue:
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Year:
2009

Citing 2
Cited 0

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Word Segmentation Using Domain Knowledge Based on Conditional Random Fields

ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02

Quantified Score

Hi-index	0.00

Visualization

Abstract

Unlike western languages, there exists no word boundary in Japanese. This is why we face to hard problems to analyze documents in Japanese very often. More difficulty arises in expertised domains such as medical, mechanical, computer science documents. In this work, we discuss how to obtain pseudo test corpus based on Markov process Monte Carlo Method (MCMC), given small amount of test data. In this environment we show nice results using our approach.