A limited memory algorithm for bound constrained optimization
SIAM Journal on Scientific Computing
A maximum entropy approach to natural language processing
Computational Linguistics
Computational Linguistics
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A phonetic similarity model for automatic extraction of transliteration pairs
ACM Transactions on Asian Language Information Processing (TALIP)
Hi-index | 0.00 |
We demonstrate the use of context features, namely, names of places, and unlabelled data for the detection of personal name language of origin. While some early work used either rule-based methods or n-gram statistical models to determine the name language of origin, we use the discriminative classification maximum entropy model and view the task as a classification task. We perform bootstrapping of the learning using list of names out of context but with known origin and then using expectation-maximisation algorithm to further train the model on a large corpus of names of unknown origin but with context features. Using a relatively small unlabelled corpus we improve the accuracy of name origin recognition for names written in Chinese from 82.7% to 85.8%, a significant reduction in the error rate. The improvement in F-score for infrequent Japanese names is even greater: from 77.4% without context features to 82.8% with context features.