Resolving surface forms to Wikipedia topics

Authors:
Yiping Zhou;Lan Nie;Omid Rouhani-Kalleh;Flavian Vasile;Scott Gaffney
Affiliations:
Yahoo! Labs at Sunnyvale;Yahoo! Labs at Sunnyvale;Yahoo! Labs at Sunnyvale;Yahoo! Labs at Sunnyvale;Yahoo! Labs at Sunnyvale
Venue:
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Year:
2010

Citing 8
Cited 14

Stochastic gradient boosting

Computational Statistics & Data Analysis - Nonlinear methods and data mining
Entity-based cross-document coreferencing using the Vector Space Model

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Unsupervised personal name disambiguation

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
A regression framework for learning ranking functions using relative relevance judgments

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to link with wikipedia

Proceedings of the 17th ACM conference on Information and knowledge management
Is Hillary Rodham Clinton the president?: disambiguating names across documents

CorefApp '99 Proceedings of the Workshop on Coreference and its Applications
Name discrimination by clustering similar contexts

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Unified analysis of streaming news

Proceedings of the 20th international conference on World wide web
A generative entity-mention model for linking entities with knowledge base

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Local and global algorithms for disambiguation to Wikipedia

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Collective entity linking in web text: a graph-based method

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
An entity-topic model for entity linking

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Linking named entities to any database

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
From machu_picchu to "rafting the urubamba river": anticipating information needs via the entity-query graph

Proceedings of the sixth ACM international conference on Web search and data mining
Online matching of web content to closed captions in IntoNow

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Searching for interestingness in Wikipedia and Yahoo!: answers

Proceedings of the 22nd international conference on World Wide Web companion
Re-ranking for joint named-entity recognition and linking

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Penguins in sweaters, or serendipitous entity search on user-generated content

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Exploring re-ranking approaches for joint named-entityrecognition and linking

Proceedings of the sixth workshop on Ph.D. students in information and knowledge management
Says who?: automatic text-based content analysis of television news

Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing
Entity linking at the tail: sparse signals, unknown entities, and phrase models

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ambiguity of entity mentions and concept references is a challenge to mining text beyond surface-level keywords. We describe an effective method of disambiguating surface forms and resolving them to Wikipedia entities and concepts. Our method employs an extensive set of features mined from Wikipedia and other large data sources, and combines the features using a machine learning approach with automatically generated training data. Based on a manually labeled evaluation set containing over 1000 news articles, our resolution model has 85% precision and 87.8% recall. The performance is significantly better than three baselines based on traditional context similarities or sense commonness measurements. Our method can be applied to other languages and scales well to new entities and concepts.