Learning dictionaries for information extraction by multi-level bootstrapping
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Snowball: extracting relations from large plain-text collections
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Extracting Patterns and Relations from the World Wide Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Learning pattern rules for Chinese named entity extraction
Eighteenth national conference on Artificial intelligence
Building semantic perceptron net for topic spotting
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
A phonetic similarity model for automatic extraction of transliteration pairs
ACM Transactions on Asian Language Information Processing (TALIP)
Hi-index | 0.00 |
Pronunciation-translated names (P-Names) bring more ambiguities to Chinese word segmentation and generic named entity recognition. As there are few annotated resources that can be used to develop a good P-Name extraction system, this paper presents a bootstrapping algorithm, called PN-Finder, to tackle this problem. Starting from a small set of P-Name characters and context cue-words, the algorithm iteratively locates more P-Names from the Internet. The algorithm uses a combination of P-Name and context word probabilities to identify new P-Names. Experiments show that our PN-Finder is able to locate a large number of P-Names (over 100,000) from the Internet with a high recognition accuracy of over 85%. Further tests on the MET-2 test set show that our PN-Finder can achieve a performance of over 90% in F1 value in locating P-Names. The results demonstrate that our PN-Finder is effective.