Automatically Extracting Personal Name Aliases from the Web

Authors:
Danushka Bollegala;Taiki Honma;Yutaka Matsuo;Mitsuru Ishizuka
Affiliations:
The University of Tokyo, Tokyo, Japan 113-8656;The University of Tokyo, Tokyo, Japan 113-8656;The University of Tokyo, Tokyo, Japan 113-8656;The University of Tokyo, Tokyo, Japan 113-8656
Venue:
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Year:
2008

Citing 17
Cited 1

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Modern Information Retrieval

Modern Information Retrieval
Mining the Web: Discovering Knowledge from HyperText Data

Mining the Web: Discovering Knowledge from HyperText Data
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Semantic search

WWW '03 Proceedings of the 12th international conference on World Wide Web
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Entity-based cross-document coreferencing using the Vector Space Model

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Finding parts in very large corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Disambiguating Web appearances of people in a social network

WWW '05 Proceedings of the 14th international conference on World Wide Web
Learning surface text patterns for a Question Answering system

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
A testbed for people searching strategies in the WWW

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised personal name disambiguation

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
POLYPHONET: an advanced social network extraction system from the web

Proceedings of the 15th international conference on World Wide Web
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
Approximate personal name-matching through finite-state graphs

Journal of the American Society for Information Science and Technology
Extracting mnemonic names of people from the web

ICADL'06 Proceedings of the 9th international conference on Asian Digital Libraries: achievements, Challenges and Opportunities

The vector space models for finding co-occurrence names as aliases in Thai sports news

ACIIDS'10 Proceedings of the Second international conference on Intelligent information and database systems: Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Extracting aliases of an entity is important for various tasks such as identification of relations among entities, web search and entity disambiguation. To extract relations among entities properly, one must first identify those entities. We propose a novel approach to find aliases of a given name using automatically extracted lexical patterns. We exploit a set of known names and their aliases as training data and extract lexical patterns that convey information related to aliases of names from text snippets returned by a web search engine. The patterns are then used to find candidate aliases of a given name. We use anchor texts to design a word co-occurrence model and use it to define various ranking scores to measure the association between a name and a candidate alias. The ranking scores are integrated with page-count-based association measures using support vector machines to leverage a robust alias detection method. The proposed method outperforms numerous baselines and previous work on alias extraction on a dataset of personal names, achieving a statistically significant mean reciprocal rank of 0.6718. Experiments carried out using a dataset of location names and Japanese personal names suggest the possibility of extending the proposed method to extract aliases for different types of named entities and for other languages. Moreover, the aliases extracted using the proposed method improve recall by 20% in a relation-detection task.