Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Selective Sampling Using the Query by Committee Algorithm
Machine Learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
Machine Learning
Support vector machine active learning with applications to text classification
The Journal of Machine Learning Research
Two supervised learning approaches for name disambiguation in author citations
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Journal of the American Society for Information Science and Technology
Name disambiguation in author citations using a K-way spectral clustering method
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
A hierarchical naive Bayes mixture model for name disambiguation in author citations
Proceedings of the 2005 ACM symposium on Applied computing
SVM selective sampling for ranking with application to data retrieval
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Multi-evidence, multi-criteria, lazy associative document classification
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Improving Grouped-Entity Resolution Using Quasi-Cliques
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Lazy Associative Classification
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Fast Kernel Classifiers with Online and Active Learning
The Journal of Machine Learning Research
Collective entity resolution in relational data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Efficient topic-based unsupervised name disambiguation
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Communications of the ACM
Optimizing estimated loss reduction for active sampling in rank learning
Proceedings of the 25th international conference on Machine learning
Keeping a digital library clean: new solutions to old problems
Proceedings of the eighth ACM symposium on Document engineering
Author Name Disambiguation for Citations Using Topic and Web Correlation
ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
On co-authorship for author disambiguation
Information Processing and Management: an International Journal
Active Sampling for Rank Learning via Optimizing the Area under the ROC Curve
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
A Latent Topic Model for Complete Entity Resolution
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Author name disambiguation in MEDLINE
ACM Transactions on Knowledge Discovery from Data (TKDD)
Disambiguating authors in academic publications using random forests
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Using web information for author name disambiguation
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Improving author coreference by resource-bounded information gathering from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Effective self-training author name disambiguation in scholarly digital libraries
Proceedings of the 10th annual joint conference on Digital libraries
Active learning for ranking through expected loss optimization
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Journal of the American Society for Information Science and Technology
On Graph-Based Name Disambiguation
Journal of Data and Information Quality (JDIQ)
Construction of a large-scale test set for author disambiguation
Information Processing and Management: an International Journal
Rule-based active sampling for learning to rank
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Efficient name disambiguation for large-scale databases
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
ADANA: Active Name Disambiguation
ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining
A Unified Probabilistic Framework for Name Disambiguation in Digital Library
IEEE Transactions on Knowledge and Data Engineering
Cost-effective on-demand associative author name disambiguation
Information Processing and Management: an International Journal
Citation-based bootstrapping for large-scale author disambiguation
Journal of the American Society for Information Science and Technology
A relevance feedback approach for the author name disambiguation problem
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Bootstrapping active name disambiguation with crowdsourcing
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
One of the hardest problems faced by current scholarly digital libraries is author name ambiguity. This problem occurs when, in a set of citation records, there are records of a same author under distinct names, or citation records belonging to distinct authors with similar names. Among the several proposed methods, the most effective ones seem to be based on the direct assignment of the records to their respective authors by means of the application of supervised machine learning techniques. The effectiveness of such methods is usually directly correlated with the amount of supervised training data available. However, the acquisition of training examples requires skilled human annotators to manually label references. Aiming to reduce the set of examples needed to produce the training data, in this paper we propose a new active sampling strategy based on association rules for the author name disambiguation task. We compare our strategy with state-of-the-art supervised baselines that use the complete labeled training dataset and other active methods and show that very competitive results in terms of disambiguation effectiveness can be obtained with reductions in the training set of up to 71%.