A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Using latent semantic analysis to find different names for the same entity in free text
Proceedings of the 4th international workshop on Web information and data management
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning domain-independent string transformation weights for high accuracy object identification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Active learning for HPSG parse selection
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Email alias detection using social network analysis
Proceedings of the 3rd international workshop on Link discovery
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
On decision making support in blood bank information systems
Expert Systems with Applications: An International Journal
Mining for personal name aliases on the web
Proceedings of the 17th international conference on World Wide Web
On active learning of record matching packages
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Disclosing false identity through hybrid link analysis
Artificial Intelligence and Law
Ranking semantic relationships between two entities using personalization in context specification
Information Sciences: an International Journal
Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection
Towards alias detection without string similarity: an active learning based approach
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication
IEEE Transactions on Knowledge and Data Engineering
Hi-index | 0.07 |
Entity aliases commonly exist. Accurately detecting these aliases plays a vital role in various applications. In particular, it is critical to detect the aliases that are intentionally hidden from the real identities, such as those of terrorists and frauds. Most existing work does not pay close attention to the aliases that have low/no string similarity to the given entities. In this paper, we propose a classifier that is based on active learning for detecting this type of aliasing. To minimize the cost of pair-wise comparison, a subset-based method is designed to restrict the selection within entity subsets. An active learning classifier is then employed in each entity subset to find the probability of whether a candidate is the alias of a given entity within the subset. After all of the results from the classifier are integrated, a list of aliases is returned for each given entity. For evaluation, we implemented four state-of-the-art methods and compared them with our proposed approach on three datasets. The results clearly demonstrate that this new active learning classifier is superior to those existing methods.