On the limited memory BFGS method for large scale optimization
Mathematical Programming: Series A and B
Character cluster based Thai information retrieval
IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Named entity recognition using a character-based probabilistic approach
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Word segmentation for the Myanmar language
Journal of Information Science
Computers and the Thai Language
IEEE Annals of the History of Computing
A Feature-Based Approach for Relation Extraction from Thai News Documents
PAISI '09 Proceedings of the Pacific Asia Workshop on Intelligence and Security Informatics
Brief Communication: Two-phase biomedical named entity recognition using CRFs
Computational Biology and Chemistry
Hi-index | 0.00 |
Named entity recognition in inherent-vowel alphabetic languages such as Burmese, Khmer, Lao, Tamil, Telugu, Bali, and Thai, is difficult since there are no explicit boundaries among words or sentences. This paper presents a novel method to exploit the concept of character clusters, a sequence of inseparable characters, to group characters into clusters, utilize statistics among characters and their clusters to extract Thai words and then recognize named entities, simultaneously. Integrated of two phases, the word-segmentation model and the namedentity-recognition model, context features are exploited to learn parameters for these two discriminative probabilistic models, i.e., CRFs, to rank a set of word and named entity candidates generated. The experimental result shows that our method significantly increases the performance of segmenting word and recognizing entities with the F-measure of 96.14% and 83.68%, respectively.