Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Joint deduplication of multiple record types in relational data
Proceedings of the 14th ACM international conference on Information and knowledge management
Record linkage: similarity measures and algorithms
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Example-driven design of efficient record matching queries
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Transformation-based Framework for Record Matching
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Large-Scale Deduplication with Constraints Using Dedupalog
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Metric Functional Dependencies
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Exploiting context analysis for combining multiple entity resolution systems
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Entity resolution with iterative blocking
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A grammar-based entity representation framework for data cleaning
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
BLOG: probabilistic models with unknown objects
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Learning string transformations from examples
Proceedings of the VLDB Endowment
Object identification with attribute-mediated dependences
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Hi-index | 0.00 |
Entity identification, that is to build corresponding relationships between objects and entities in dirty data, plays an important role in data cleaning. The confusion between entities and their names often results in dirty data. That is, different entities may share the identical name and different names may correspond to the identical entity. Therefore, the major task of entity identification is to distinguish entities sharing the same name and recognize different names referring to the same entity. However, current research focuses on only one aspect and cannot solve the problem completely. To address this problem, in this paper, EIF, a framework of entity identification with the consideration of the both kinds of confusions, is proposed. With effective clustering techniques, approximate string matching algorithms and a flexible mechanism of knowledge integration, EIF can be widely used to solve many different kinds of entity identification problems. In this paper, as an application of EIF, we solved the author identification problem. The effectiveness of this framework is verified by extensive experiments.