C4.5: programs for machine learning
C4.5: programs for machine learning
Adaptive multilingual sentence boundary disambiguation
Computational Linguistics
Statistical decision-tree models for parsing
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
The automatic extraction of open compounds from text corpora
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Towards an intelligent multilingual keyboard system
HLT '01 Proceedings of the first international conference on Human language technology research
The state of the art in Thai language processing
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Improving translation quality of rule-based machine translation
COLING-MTIA '02 Proceedings of the 2002 COLING workshop on Machine translation in Asia - Volume 16
Two-character Chinese word extraction based on hybrid of internal and contextual measures
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Word segmentation for the Myanmar language
Journal of Information Science
Determining the Dependency Among Clauses Based on Machine Learning Techniques
ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part I
Extracting Semantic Frames from Thai Medical-Symptom Phrases with Unknown Boundaries
ASWC '08 Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web
Statistical-Based Approach to Non-segmented Language Processing
IEICE - Transactions on Information and Systems
Research on Domain Term Extraction Based on Conditional Random Fields
ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
Thai Word Segmentation with Hidden Markov Model and Decision Tree
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Chinese term extraction using minimal resources
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
A concept in error correction of text editors: case study Thai-English set
Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
A delimiter-based general approach for Chinese term extraction
Journal of the American Society for Information Science and Technology
Comparison of various machine learning-based classifications of relative clauses
ACS'06 Proceedings of the 6th WSEAS international conference on Applied computer science
Syntactic analysis of long sentences based on s-clauses
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Word extraction based on semantic constraints in chinese word-formation
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Hi-index | 0.00 |
"Word" is difficult to define in the languages that do not exhibit explicit word boundary, such as Thai. Traditional methods on defining words for this kind of languages have to depend on human judgement which bases on unclear criteria or procedures, and have several limitations. This paper proposes an algorithm for word extraction from Thai texts without borrowing a hand from word segmentation. We employ the c4.5 learning algorithm for this task. Several attributes such as string length, frequency, mutual information and entropy are chosen for word/non-word determination. Our experiment yields high precision results about 85% in both training and test corpus.