Extracting pronunciation-translated names from Chinese texts using bootstrapping approach

Authors:
Jing Xiao;Jimin Liu;Tat-Seng Chua
Affiliations:
National University of Singapore;National University of Singapore;National University of Singapore
Venue:
SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
Year:
2002

Citing 5
Cited 1

Learning dictionaries for information extraction by multi-level bootstrapping

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Learning pattern rules for Chinese named entity extraction

Eighteenth national conference on Artificial intelligence
Building semantic perceptron net for topic spotting

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics

A phonetic similarity model for automatic extraction of transliteration pairs

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Pronunciation-translated names (P-Names) bring more ambiguities to Chinese word segmentation and generic named entity recognition. As there are few annotated resources that can be used to develop a good P-Name extraction system, this paper presents a bootstrapping algorithm, called PN-Finder, to tackle this problem. Starting from a small set of P-Name characters and context cue-words, the algorithm iteratively locates more P-Names from the Internet. The algorithm uses a combination of P-Name and context word probabilities to identify new P-Names. Experiments show that our PN-Finder is able to locate a large number of P-Names (over 100,000) from the Internet with a high recognition accuracy of over 85%. Further tests on the MET-2 test set show that our PN-Finder can achieve a performance of over 90% in F1 value in locating P-Names. The results demonstrate that our PN-Finder is effective.