CiteSeer: an automatic citation indexing system
Proceedings of the third ACM conference on Digital libraries
Entity-based cross-document coreferencing using the Vector Space Model
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
The author-topic model for authors and documents
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Information extraction from research papers using conditional random fields
Information Processing and Management: an International Journal
Incorporating non-local information into information extraction systems by Gibbs sampling
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Discovering different types of topics: factored topic models
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
This paper describes a generative approach for tackling the problem of identity resolution in a completely unsupervised context with no fixed assumption regarding the true number of identities. The problem of entity resolution involves associating different references to authors (in a paper's author list, for example) with real underlying identities. The references may be written in differing forms or may have errors, and identical references may refer to different real identities. The approach taken here uses a generative model of both the abstract of a document and its list of authors to resolve identities in a corpus of documents. In the model, authors and topics are associated with latent groups. For each document, an abstract and an author list are generated conditioned on a given group. Results are presented on real-world datasets, and outperform the best performing unsupervised methods.