Technical term recognition with semi-supervised learning using hierarchical bayesian language models

Authors:
Ryo Fujii;Akito Sakurai
Affiliations:
Keio University, Yokohama, Kanagawa, Japan;Keio University, Yokohama, Kanagawa, Japan
Venue:
NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
Year:
2012

Citing 7
Cited 0

Semi-supervised conditional random fields for improved sequence segmentation and labeling

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A hierarchical Bayesian language model based on Pitman-Yor processes

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Incorporating lexical knowledge into biomedical NE recognition

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Annotating multiple types of biomedical entities: a single word classification approach

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Named entity recognition in biomedical texts using an HMM model

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Exploring deep knowledge resources in biomedical name recognition

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
POSBIOTM-NER in the shared task of BioNLP/NLPBA 2004

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

To recognize technical term, term dictionaries or tagged corpora are required, but it will take much cost to compile them. Moreover, the terms may have several representations and new terms may be developed, which complicates the problem further, that is, a simple dictionary building can't solve the problem. In this research, to reduce the cost of creating dictionaries, we aimed at building a system that learns to recognize terminology from small tagged corpus using semi-supervised learning. We solved the problem by combining a tag level language model and a character level language model based on HPYLM. We performed experiments on recognition of biomedical terms. In supervised learning, we achived 65% F-measure which is 8% points behind the best existing system that utilizes many domain specific heuristics. In semi-supervised learning, we could keep the accuracy against reduction of supervised data better than exisiting methods.