A robust web personal name information extraction system

Authors:
Ying Chen;Sophia Yat Mei Lee;Chu-Ren Huang
Affiliations:
College of Information and Electrical Engineering, China Agricultural University, PR China and Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong;Language Centre, Hong Kong Baptist University, Hong Kong and Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong;Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 16
Cited 0

Learning Information Extraction Rules for Semi-Structured and Free Text

Machine Learning - Special issue on natural language learning
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Entity-based cross-document coreferencing using the Vector Space Model

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
TEG: a hybrid approach to information extraction

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Unsupervised personal name disambiguation

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Multi-document statistical fact extraction and fusion

Multi-document statistical fact extraction and fusion
Weakly supervised learning for cross-document person name disambiguation supported by information extraction

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Extracting personal names from email: applying named entity recognition to informal text

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
URES: an unsupervised web relation extraction system

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Robust Disambiguation of Web-Based Personal Names

ICSC '08 Proceedings of the 2008 IEEE International Conference on Semantic Computing
Unsupervised Discrimination of Person Names in Web Contexts

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Entity extraction is a boring solved problem: or is it?

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
FICO: web person disambiguation via weighted similarity of entity contexts

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
JHU1: an unsupervised approach to person name disambiguation using web snippets

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Name discrimination by clustering similar contexts

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	12.05

Visualization

Abstract

Personal information extraction, which extracts the persons in question and their related information (such as biographical information and occupation) from web, is an important component to construct social network (a kind of semantic web). For this practical task, two important issues are to be discussed: personal named entity ambiguity and the extraction of personal information for a specific person. For personal named entity ambiguity, which is a common phenomenon in the fast growing web resource, we propose a robust system which extracts lightweight features with a totally unsupervised approach from broad resources. The experiments show that these lightweight features not only improve the performances, but also increase the robustness of a disambiguation system. To extract the information of the focus person, an integrated system is introduced, which is able to effectively re-use and combine current well-developed tools for web data, and at the same time, to identify the expression properties of web data. We show that our flexible extraction system achieves state-of-the-art performances, especially the high precision, which is very important for real applications.