Translating collocations for bilingual lexicons: a statistical approach
Computational Linguistics
A stochastic finite-state word-segmentation algorithm for Chinese
Computational Linguistics
Introduction to the special issue on computational linguistics using large corpora
Computational Linguistics - Special issue on using large corpora: I
Retrieving collocations from text: Xtract
Computational Linguistics - Special issue on using large corpora: I
Empirical estimates of adaptation: the chance of two noriegas is closer to p/2 than p2
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Word identification for Mandarin Chinese sentences
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
Chinese unknown word identification using character-based tagging and chunking
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
A bottom-up merging algorithm for Chinese unknown word extraction
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Japanese unknown word identification by character-based chunking
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A collaborative framework for collecting Thai unknown words from the web
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
A Hybrid Technique for English-Chinese Cross Language Information Retrieval
ACM Transactions on Asian Language Information Processing (TALIP)
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
A detection and annotation system for internet new words in Taiwan
CIMMACS'05 Proceedings of the 4th WSEAS international conference on Computational intelligence, man-machine systems and cybernetics
ACM Transactions on Asian Language Information Processing (TALIP)
High speed unknown word prediction using support vector machine for chinese text-to-speech systems
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Boosting-based ensemble learning with penalty profiles for automatic Thai unknown word recognition
Computers & Mathematics with Applications
A new method to compose long unknown Chinese keywords
Journal of Information Science
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Unknown Chinese word extraction based on variety of overlapping strings
Information Processing and Management: an International Journal
Hi-index | 0.01 |
There is no blank to mark word boundaries in Chinese text. As a result, identifying words is difficult, because of segmentation ambiguities and occurrences of unknown words. Conventionally unknown words were extracted by statistical methods because statistical methods are simple and efficient. However the statistical methods without using linguistic knowledge suffer the drawbacks of low precision and low recall, since character strings with statistical significance might be phrases or partial phrases instead of words and low frequency new words are hardly identifiable by statistical methods. In addition to statistical information, we try to use as much information as possible, such as morphology, syntax, semantics, and world knowledge. The identification system fully utilizes the context and content information of unknown words in the steps of detection process, extraction process, and verification process. A practical unknown word extraction system was implemented which online identifies new words, including low frequency new words, with high precision and high recall rates.