Improving Grouped-Entity Resolution Using Quasi-Cliques

Authors:
Byung-Won On;Ergin Elmacioglu;Dongwon Lee;Jaewoo Kang;Jian Pei
Affiliations:
The Pennsylvania State University, USA;The Pennsylvania State University, USA;The Pennsylvania State University, USA;NCSU & Korea Univ., Korea;Simon Fraser Univ., Canada
Venue:
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Year:
2006

Citing 0
Cited 10

Adaptive graphical approach to entity resolution

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Reconciliando dados de cunho acadêmico

SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
idMesh: graph-based disambiguation of linked data

Proceedings of the 18th international conference on World wide web
Exploiting context analysis for combining multiple entity resolution systems

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Multiple relationship based deduplication

Proceedings of the Fourth SIGMOD PhD Workshop on Innovative Database Research
On Graph-Based Name Disambiguation

Journal of Data and Information Quality (JDIQ)
Active associative sampling for author name disambiguation

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Multiple instance learning for group record linkage

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
A brief survey of automatic methods for author name disambiguation

ACM SIGMOD Record
A relevance feedback approach for the author name disambiguation problem

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

The entity resolution (ER) problem, which identifies duplicate entities that refer to the same real world entity, is essential in many applications. In this paper, in particular, we focus on resolving entities that contain a group of related elements in them (e.g., an author entity with a list of citations, a singer entity with song list, or an intermediate result by GROUP BY SQL query). Such entities, named as grouped-entities, frequently occur in many applications. The previous approaches toward grouped-entity resolution often rely on textual similarity, and produce a large number of false positives. As a complementing technique, in this paper, we present our experience of applying a recently proposed graph mining technique, Quasi-Clique, atop conventional ER solutions. Our approach exploits contextual information mined from the group of elements per entity in addition to syntactic similarity. Extensive experiments verify that our proposal improves precision and recall up to 83% when used together with a variety of existing ER solutions, but never worsens them.