Self-tuning in graph-based reference disambiguation

Authors:
Rabia Nuray-Turan;Dmitri V. Kalashnikov;Sharad Mehrotra
Affiliations:
Computer Science Department, University of California, Irvine;Computer Science Department, University of California, Irvine;Computer Science Department, University of California, Irvine
Venue:
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Year:
2007

Citing 20
Cited 6

The merge/purge problem for large databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient clustering of high-dimensional data sets with application to reference matching

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Hardening soft information sources

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Interactive deduplication using active learning

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning domain-independent string transformation weights for high accuracy object identification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Algorithms for estimating relative importance in networks

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Fast discovery of connection subgraphs

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Disambiguating Web appearances of people in a social network

WWW '05 Proceedings of the 14th international conference on World Wide Web
Reference reconciliation in complex information spaces

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Data cleaning in microsoft SQL server 2005

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Exploiting relationships for object consolidation

Proceedings of the 2nd international workshop on Information quality in information systems
Relational clustering for multi-type entity resolution

MRDM '05 Proceedings of the 4th international workshop on Multi-relational mining
Domain-independent data cleaning via analysis of entity-relationship graph

ACM Transactions on Database Systems (TODS)
Contextual search and name disambiguation in email using graphs

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Operations Research and Revised CD-ROM 8

Introduction to Operations Research and Revised CD-ROM 8
Eliminating fuzzy duplicates in data warehouses

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Identification and tracing of ambiguous names: discriminative and generative approaches

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence

Adaptive graphical approach to entity resolution

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Towards breaking the quality curse.: a web-querying approach to web people search.

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting context analysis for combining multiple entity resolution systems

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Attribute and object selection queries on objects with probabilistic attributes

ACM Transactions on Database Systems (TODS)
Exploiting Web querying for Web people search

ACM Transactions on Database Systems (TODS)
Adaptive Connection Strength Models for Relationship-Based Entity Resolution

Journal of Data and Information Quality (JDIQ) - Special Issue on Entity Resolution

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays many data mining/analysis applications use the graph analysis techniques for decision making. Many of these techniques are based on the importance of relationships among the interacting units. A number of models and measures that analyze the relationship importance (link structure) have been proposed (e.g., centrality, importance and page rank) and they are generally based on intuition, where the analyst intuitively decides a reasonable model that fits the underlying data. In this paper, we address the problem of learning such models directly from training data. Specifically, we study a way to calibrate a connection strength measure from training data in the context of reference disambiguation problem. Experimental evaluation demonstrates that the proposed model surpasses the best model used for reference disambiguation in the past, leading to better quality of reference disambiguation.