C4.5: programs for machine learning
C4.5: programs for machine learning
A corpus-based approach to automatic compound extraction
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Word extraction based on semantic constraints in chinese word-formation
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Hi-index | 0.00 |
This paper presents a novel approach to Chinese disyllabic word extraction based on semantic information of characters. Two thesauri of Chinese characters, manually-crafted and machine-generated, are conducted. A Chinese wordlist with 63,738 two-character words, together with the character thesauri, are explored to learn semantic constraints between characters in Chinese word-formation, resulting in two types of semantic-tag-based HMM. Experiments show that: (1) both schemes outperform their character-based counterpart; (2) the machine-generated thesaurus outperforms the hand-crafted one to some extent in word extraction, and (3) the proper combination of semantic-tag-based and character-based methods could benefit word extraction.