A stochastic finite-state word-segmentation algorithm for Chinese
Computational Linguistics
Algorithms for bigram and trigram word clustering
Speech Communication
Making large-scale support vector machine learning practical
Advances in kernel methods
An efficient method for determining bilingual word classes
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Chinese named entity identification using class-based language model
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Improved source-channel models for Chinese word segmentation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Journal of Computer Science and Technology
Automatic Expansion of Chinese Abbreviations by Web Mining
AICI '09 Proceedings of the International Conference on Artificial Intelligence and Computational Intelligence
Integrating Generative and Discriminative Character-Based Models for Chinese Word Segmentation
ACM Transactions on Asian Language Information Processing (TALIP)
Hi-index | 0.01 |
Chinese abbreviations are frequently used without being defined, which has brought much difficulty into NLP. In this study, the definition-independent abbreviation identification problem is proposed and resolved as a classification task in which abbreviation candidates are classified as either ‘abbreviation' or ‘non-abbreviation' according to the posterior probability. To meet our aim of identifying new abbreviations from existing ones, our solution is to add generalization capability to the abbreviation lexicon by replacing words with word classes and therefore create abbreviation-templates. By utilizing abbreviation-template features as well as context information, a SVM model is employed as the classifier. The evaluation on a raw Chinese corpus obtains an encouraging performance. Our experiments further demonstrate the improvement after integrating with morphological analysis, substring analysis and person name identification.