Named entity disambiguation in streaming data

Authors:
Alexandre Davis;Adriano Veloso;Altigran S. da Silva;Wagner Meira, Jr.;Alberto H. F. Laender
Affiliations:
Federal University of Minas Gerais;Federal University of Minas Gerais;Federal University of Amazonas;Federal University of Minas Gerais;Federal University of Minas Gerais
Venue:
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Year:
2012

Citing 23
Cited 4

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Positive and Unlabeled Examples Help Learning

ALT '99 Proceedings of the 10th International Conference on Algorithmic Learning Theory
Learning from Positive and Unlabeled Examples

ALT '00 Proceedings of the 11th International Conference on Algorithmic Learning Theory
PAC Learning from Positive Statistical Queries

ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Entity-based cross-document coreferencing using the Vector Space Model

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Disambiguating Web appearances of people in a social network

WWW '05 Proceedings of the 14th international conference on World Wide Web
Reference reconciliation in complex information spaces

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Person resolution in person search results: WebHawk

Proceedings of the 14th ACM international conference on Information and knowledge management
Learning to deduplicate

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Contextual search and name disambiguation in email using graphs

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Lazy Associative Classification

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Collective entity resolution in relational data

ACM Transactions on Knowledge Discovery from Data (TKDD)
Discovering relations among named entities from large corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
An Approach to Web-Scale Named-Entity Disambiguation

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Learning to classify texts using positive and unlabeled data

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Named entity disambiguation by leveraging wikipedia semantic knowledge

Proceedings of the 18th ACM conference on Information and knowledge management
Twitter power: Tweets as electronic word of mouth

Journal of the American Society for Information Science and Technology
Recognizing named entities in tweets

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Effective sentiment stream analysis with self-augmenting training and demand-driven projection

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Name discrimination by clustering similar contexts

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Robust disambiguation of named entities in text

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

A weakly-supervised detection of entity central documents in a stream

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
RESLVE: leveraging user interest to improve entity disambiguation on short text

Proceedings of the 22nd international conference on World Wide Web companion
Re-ranking for joint named-entity recognition and linking

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Exploring re-ranking approaches for joint named-entityrecognition and linking

Proceedings of the sixth workshop on Ph.D. students in information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The named entity disambiguation task is to resolve the many-to-many correspondence between ambiguous names and the unique real-world entity. This task can be modeled as a classification problem, provided that positive and negative examples are available for learning binary classifiers. High-quality sense-annotated data, however, are hard to be obtained in streaming environments, since the training corpus would have to be constantly updated in order to accomodate the fresh data coming on the stream. On the other hand, few positive examples plus large amounts of unlabeled data may be easily acquired. Producing binary classifiers directly from this data, however, leads to poor disambiguation performance. Thus, we propose to enhance the quality of the classifiers using finer-grained variations of the well-known Expectation-Maximization (EM) algorithm. We conducted a systematic evaluation using Twitter streaming data and the results show that our classifiers are extremely effective, providing improvements ranging from 1% to 20%, when compared to the current state-of-the-art biased SVMs, being more than 120 times faster.