Mining for personal name aliases on the web

Authors:
Danushka Bollegala;Taiki Honma;Yutaka Matsuo;Mitsuru Ishizuka
Affiliations:
The University of Tokyo, Tokyo, Japan;The University of Tokyo, Tokyo, Japan;The University of Tokyo, Tokyo, Japan;The University of Tokyo, Tokyo, Japan
Venue:
Proceedings of the 17th international conference on World Wide Web
Year:
2008

Citing 3
Cited 7

Disambiguating Web appearances of people in a social network

WWW '05 Proceedings of the 14th international conference on World Wide Web
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
Extracting mnemonic names of people from the web

ICADL'06 Proceedings of the 9th international conference on Asian Digital Libraries: achievements, Challenges and Opportunities

On knowledge-poor methods for person name matching and lemmatization for highly inflectional languages

Information Retrieval
Using web information for author name disambiguation

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Using the web to validate lexico-semantic relations

EPIA'11 Proceedings of the 15th Portugese conference on Progress in artificial intelligence
Towards alias detection without string similarity: an active learning based approach

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Source code identifier splitting using Yahoo image and web search engine

Proceedings of the First International Workshop on Software Mining
Semantic similarity measurement using historical google search patterns

Information Systems Frontiers
Toward detection of aliases without string similarity

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a novel approach to find aliases of a given name from the web. We exploit a set of known names and their aliases as training data and extract lexical patterns that convey information related to aliases of names from text snippets returned by a web search engine. The patterns are then used to find candidate aliases of a given name. We use anchor texts and hyperlinks to design a word co-occurrence model and define numerous ranking scores to evaluate the association between a name and its candidate aliases. The proposed method outperforms numerous baselines and previous work on alias extraction on a dataset of personal names, achieving a statistically significant mean reciprocal rank of 0.6718. Moreover, the aliases extracted using the proposed method improve recall by 20% in a relation-detection task.