Supervised methods for symptom name recognition in free-text clinical records of traditional Chinese medicine: An empirical study

Authors:
Yaqiang Wang;Zhonghua Yu;Li Chen;Yunhui Chen;Yiguang Liu;Xiaoguang Hu;Yongguang Jiang
Affiliations:
Department of Computer Science, Sichuan University, Chengdu, Sichuan 610064, PR China;Department of Computer Science, Sichuan University, Chengdu, Sichuan 610064, PR China;Department of Computer Science, Sichuan University, Chengdu, Sichuan 610064, PR China;School of Fundamental Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan 610075, PR China;Department of Computer Science, Sichuan University, Chengdu, Sichuan 610064, PR China;No. 1 Clinical Hospital, Beihua University, Jilin, Jilin 132011, PR China;Department of Preclinical Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan 610075, PR China
Venue:
Journal of Biomedical Informatics
Year:
2014

Citing 15
Cited 0

A limited memory algorithm for bound constrained optimization

SIAM Journal on Scientific Computing
On the use of words and n-grams for Chinese information retrieval

IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Chinese named entity recognition using lexicalized HMMs

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach

Computational Linguistics
A comparison of algorithms for maximum entropy parameter estimation

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
Methodological review: Knowledge discovery in traditional Chinese medicine: State of the art and perspectives

Artificial Intelligence in Medicine
A self-learning expert system for diagnosis in traditional Chinese medicine

Expert Systems with Applications: An International Journal
Development of traditional Chinese medicine clinical data warehouse for medical knowledge discovery and decision support

Artificial Intelligence in Medicine
Methodological Review: Text mining for traditional Chinese medical knowledge discovery: A survey

Journal of Biomedical Informatics
A framework and its empirical study of automatic diagnosis of traditional Chinese medicine utilizing raw free-text clinical records

Journal of Biomedical Informatics
A preliminary work on symptom name recognition from free-text clinical records of traditional chinese medicine using conditional random fields and reasonable features

BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clinical records of traditional Chinese medicine (TCM) are documented by TCM doctors during their routine diagnostic work. These records contain abundant knowledge and reflect the clinical experience of TCM doctors. In recent years, with the modernization of TCM clinical practice, these clinical records have begun to be digitized. Data mining (DM) and machine learning (ML) methods provide an opportunity for researchers to discover TCM regularities buried in the large volume of clinical records. There has been some work on this problem. Existing methods have been validated on a limited amount of manually well-structured data. However, the contents of most fields in the clinical records are unstructured. As a result, the previous methods verified on the well-structured data will not work effectively on the free-text clinical records (FCRs), and the FCRs are, consequently, required to be structured in advance. Manually structuring the large volume of TCM FCRs is time-consuming and labor-intensive, but the development of automatic methods for the structuring task is at an early stage. Therefore, in this paper, symptom name recognition (SNR) in the chief complaints, which is one of the important tasks to structure the FCRs of TCM, is carefully studied. The SNR task is reasonably treated as a sequence labeling problem, and several fundamental and practical problems in the SNR task are studied, such as how to adapt a general sequence labeling strategy for the SNR task according to the domain-specific characteristics of the chief complaints and which sequence classifier is more appropriate to solve the SNR task. To answer these questions, a series of elaborate experiments were performed, and the results are explained in detail.