Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
Information retrieval
Suffix arrays: a new method for on-line string searches
SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
A fast string searching algorithm
Communications of the ACM
Non-dictionary-based Thai word segmentation using decision trees
HLT '01 Proceedings of the first international conference on Human language technology research
Combining prediction by partial matching and logistic regression for Thai word segmentation
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Automatic construction of a lexical attribute knowledge base
KSEM'07 Proceedings of the 2nd international conference on Knowledge science, engineering and management
A minimum cluster-based trigram statistical model for Thai syllabification
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Simultaneous character-cluster-based word segmentation and named entity recognition in Thai language
KICSS'10 Proceedings of the 5th international conference on Knowledge, information, and creativity support systems
Hi-index | 0.00 |
Some languages including Thai, Japanese and Chinese do not have explicit word boundary. This causes the problem of word boundary ambiguity that results in decreasing the accuracy of information retrieval. This paper proposes a new technique so-called character clustering to reduce the ambiguity of word boundary in Thai documents and hence improve searching efficiency. To investigate the efficiency, a set of experiments using Thai newspapers is conducted in both non-indexing and indexing searching approaches. The experimental results show our method outperform the traditional methods in both non-indexing and indexing approaches in all measures.