CiteSeer: an automatic citation indexing system
Proceedings of the third ACM conference on Digital libraries
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Collaborative filtering via gaussian probabilistic latent semantic analysis
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The Journal of Machine Learning Research
Two supervised learning approaches for name disambiguation in author citations
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Web usage mining based on probabilistic latent semantic analysis
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
The author-topic model for authors and documents
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Disambiguating Web appearances of people in a social network
WWW '05 Proceedings of the 14th international conference on World Wide Web
Name disambiguation in author citations using a K-way spectral clustering method
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
A Bayesian Hierarchical Model for Learning Natural Scene Categories
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Effective and scalable solutions for mixed and split citation problems in digital libraries
Proceedings of the 2nd international workshop on Information quality in information systems
Discovering user access pattern based on probabilistic latent factor model
ADC '05 Proceedings of the 16th Australasian database conference - Volume 39
Discovering Objects and their Localization in Images
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Learning Hierarchical Models of Scenes, Objects, and Parts
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Unsupervised personal name disambiguation
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Improved annotation of the blogosphere via autotagging and hierarchical clustering
Proceedings of the 15th international conference on World Wide Web
LDA-based document models for ad-hoc retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Topics over time: a non-Markov continuous-time model of topical trends
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient name disambiguation for large-scale databases
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Using web information for creating publication venue authority files
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Named entity normalization in user generated content
Proceedings of the second workshop on Analytics for noisy unstructured text data
Keeping a digital library clean: new solutions to old problems
Proceedings of the eighth ACM symposium on Document engineering
Author Name Disambiguation for Citations Using Topic and Web Correlation
ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
On co-authorship for author disambiguation
Information Processing and Management: an International Journal
Alleviating the Problem of Wrong Coreferences in Web Person Search
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
A Term-Based Driven Clustering Approach for Name Disambiguation
APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Author name disambiguation in MEDLINE
ACM Transactions on Knowledge Discovery from Data (TKDD)
Disambiguating authors in academic publications using random forests
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Using web information for author name disambiguation
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Clustering technique in multi-document personal name disambiguation
ACLstudent '09 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop
Latent Topic Extraction from Relational Table for Record Matching
DS '09 Proceedings of the 12th International Conference on Discovery Science
SyGAR: a synthetic data generator for evaluating name disambiguation methods
ECDL'09 Proceedings of the 13th European conference on Research and advanced technology for digital libraries
Effective self-training author name disambiguation in scholarly digital libraries
Proceedings of the 10th annual joint conference on Digital libraries
Disambiguating identity web references using Web 2.0 data and semantics
Web Semantics: Science, Services and Agents on the World Wide Web
Citation author topic model in expert search
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Journal of the American Society for Information Science and Technology
Construction of a large-scale test set for author disambiguation
Information Processing and Management: an International Journal
Eliminating the redundancy in blocking-based entity resolution methods
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Resolving author name homonymy to improve resolution of structures in co-author networks
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Event detection with spatial latent Dirichlet allocation
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Efficient name disambiguation in digital libraries
WAIM'11 Proceedings of the 12th international conference on Web-age information management
Unsupervised name ambiguity resolution using a generative model
EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Disambiguating authors in citations on the web and authorship correlations
Expert Systems with Applications: An International Journal
Cost-effective on-demand associative author name disambiguation
Information Processing and Management: an International Journal
A tool for generating synthetic authorship records for evaluating author name disambiguation methods
Information Sciences: an International Journal
Active associative sampling for author name disambiguation
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Foundations and Trends in Information Retrieval
A brief survey of automatic methods for author name disambiguation
ACM SIGMOD Record
Author disambiguation using wikipedia-based explicit semantic analysis
NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
Ambiguous author query detection using crowdsourced digital library annotations
Information Processing and Management: an International Journal
A relevance feedback approach for the author name disambiguation problem
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Resolving homonymy with correlation clustering in scholarly digital libraries
Proceedings of the 22nd international conference on World Wide Web companion
Towards a fair comparison between name disambiguation approaches
Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Academic network analysis: a joint topic modeling approach
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Hi-index | 0.00 |
Name ambiguity is a special case of identity uncertainty where one person can be referenced by multiple name variations in different situations or even share the same name with other people. In this paper, we focus on the problem of disambiguating person names within web pages and scientific documents. We present an efficient and effective two-stage approach to disambiguate names. In the first stage, two novel topic-based models are proposed by extending two hierarchical Bayesian text models, namely Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA). Our models explicitly introduce a new variable for persons and learn the distribution of topics with regard to persons and words. After learning an initial model, the topic distributions are treated as feature sets and names are disambiguated by leveraging a hierarchical agglomerative clustering method. Experiments on web data and scientific documents from CiteSeer indicate that our approach consistently outperforms other unsupervised learning methods such as spectral clustering and DBSCAN clustering and could be extended to other research fields. We empirically addressed the issue of scalability by disambiguating authors in over 750,000 papers from the entire CiteSeer dataset.