The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
CiteSeer: an automatic citation indexing system
Proceedings of the third ACM conference on Digital libraries
IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Data integration using similarity joins and a word-based information representation language
ACM Transactions on Information Systems (TOIS)
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Learning object identification rules for information integration
Information Systems - Data extraction, cleaning and reconciliation
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to match and cluster large high-dimensional data sets for data integration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
The link prediction problem for social networks
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Iterative record linkage for cleaning and integration
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
A hierarchical graphical model for record linkage
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Semantic integration in text: from ambiguous names to identifiable entities
AI Magazine - Special issue on semantic integration
Adaptive Name Matching in Information Integration
IEEE Intelligent Systems
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A graph-based approach to vehicle tracking in traffic camera video streams
DMSN '07 Proceedings of the 4th workshop on Data management for sensor networks: in conjunction with 33rd International Conference on Very Large Data Bases
A two-step classification approach to unsupervised record linkage
AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
Structured entity identification and document categorization: two tasks with one joint model
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic record linkage using seeded nearest neighbour and support vector machine classification
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised deduplication using cross-field dependencies
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
A Graph Partitioning Approach to Entity Disambiguation Using Uncertain Information
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Scaling up duplicate detection in graph data
Proceedings of the 17th ACM conference on Information and knowledge management
Towards Machine Learning on the Semantic Web
Uncertainty Reasoning for the Semantic Web I
Reconciliando dados de cunho acadêmico
SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
Efficient top-k count queries over imprecise duplicates
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Author name disambiguation in MEDLINE
ACM Transactions on Knowledge Discovery from Data (TKDD)
Supervised machine learning algorithms for protein structure classification
Computational Biology and Chemistry
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Geocode Matching and Privacy Preservation
Privacy, Security, and Trust in KDD
SSnetViz: a visualization engine for heterogeneous semantic social networks
Proceedings of the 11th International Conference on Electronic Commerce
An Approach to Web-Scale Named-Entity Disambiguation
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Identifying graphs from noisy and incomplete data
Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data
Online collective entity resolution
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
The relational push-pull model: a generative model for relational data clustering
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Journal of Artificial Intelligence Research
An environment for building, exploring and querying academic social networks
Proceedings of the International Conference on Management of Emergent Digital EcoSystems
A translation model for matching reviews to objects
Proceedings of the 18th ACM conference on Information and knowledge management
Trust relationship prediction using online product review data
Proceedings of the 1st ACM international workshop on Complex networks meet information & knowledge management
Frameworks for entity matching: A comparison
Data & Knowledge Engineering
Modeling and querying possible repairs in duplicate detection
Proceedings of the VLDB Endowment
Matching reviews to objects using a language model
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
HARRA: fast iterative hashed record linkage for large-scale data collections
Proceedings of the 13th International Conference on Extending Database Technology
Constructing folksonomies by integrating structured metadata
Proceedings of the 19th international conference on World wide web
Multiple relationship based deduplication
Proceedings of the Fourth SIGMOD PhD Workshop on Innovative Database Research
Effective self-training author name disambiguation in scholarly digital libraries
Proceedings of the 10th annual joint conference on Digital libraries
Growing a tree in the forest: constructing folksonomies by integrating structured metadata
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Relational network-service clustering analysis with set evidences
Proceedings of the 3rd ACM workshop on Artificial intelligence and security
A Combination Approach to Web User Profiling
ACM Transactions on Knowledge Discovery from Data (TKDD)
Exploring and visualizing academic social networks
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Identifying graphs from noisy and incomplete data
ACM SIGKDD Explorations Newsletter
Disclosing false identity through hybrid link analysis
Artificial Intelligence and Law
On Graph-Based Name Disambiguation
Journal of Data and Information Quality (JDIQ)
Record linkage with uniqueness constraints and erroneous values
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
A probabilistic approach for learning folksonomies from structured data
Proceedings of the fourth ACM international conference on Web search and data mining
Large-scale collective entity matching
Proceedings of the VLDB Endowment
Entity Resolution and Information Quality
Entity Resolution and Information Quality
Resolving author name homonymy to improve resolution of structures in co-author networks
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
PLINI: a probabilistic logic program framework for inconsistent news information
Logic programming, knowledge representation, and nonmonotonic reasoning
Which noun phrases denote which concepts?
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Collective graph identification
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Frequency-aware similarity measures: why Arnold Schwarzenegger is always a duplicate
Proceedings of the 20th ACM international conference on Information and knowledge management
Duplicate detection through structure optimization
Proceedings of the 20th ACM international conference on Information and knowledge management
KD2R: a key discovery method for semantic reference reconciliation
OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems
PARIS: probabilistic alignment of relations, instances, and schema
Proceedings of the VLDB Endowment
Structured databases of named entities from Bayesian nonparametrics
EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
IDA'10 Proceedings of the 9th international conference on Advances in Intelligent Data Analysis
Targeted disambiguation of ad-hoc, homogeneous sets of named entities
Proceedings of the 21st international conference on World Wide Web
Cost-effective on-demand associative author name disambiguation
Information Processing and Management: an International Journal
A tool for generating synthetic authorship records for evaluating author name disambiguation methods
Information Sciences: an International Journal
Active associative sampling for author name disambiguation
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Citation-based bootstrapping for large-scale author disambiguation
Journal of the American Society for Information Science and Technology
Flexible and efficient distributed resolution of large entities
FoIKS'12 Proceedings of the 7th international conference on Foundations of Information and Knowledge Systems
Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Cross-Guided Clustering: Transfer of Relevant Supervision across Tasks
ACM Transactions on Knowledge Discovery from Data (TKDD)
Information Visualization - Special issue on State of the Field and New Research Directions
A brief survey of automatic methods for author name disambiguation
ACM SIGMOD Record
Conceptual clustering of multi-relational data
ILP'11 Proceedings of the 21st international conference on Inductive Logic Programming
Entity resolution: theory, practice & open challenges
Proceedings of the VLDB Endowment
Named entity disambiguation in streaming data
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Matching product titles using web-based enrichment
Proceedings of the 21st ACM international conference on Information and knowledge management
De-duplication of aggregation authority files
International Journal of Metadata, Semantics and Ontologies
Author name disambiguation using a new categorical distribution similarity
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
10th international workshop on quality in databases: QDB 2012
ACM SIGMOD Record
Ambiguous author query detection using crowdsourced digital library annotations
Information Processing and Management: an International Journal
What's in a name?: an unsupervised approach to link users across communities
Proceedings of the sixth ACM international conference on Web search and data mining
Domain-Independent Entity Coreference for Linking Ontology Instances
Journal of Data and Information Quality (JDIQ) - Special Issue on Entity Resolution
International Journal of Agent Technologies and Systems
Transforming graph data for statistical relational learning
Journal of Artificial Intelligence Research
Reducing the size of databases for multirelational classification: a subgraph-based approach
Journal of Intelligent Information Systems
A joint classification method to integrate scientific and social networks
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Knowledge harvesting in the big-data era
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
GRDB: a system for declarative and interactive analysis of noisy information networks
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
A relevance feedback approach for the author name disambiguation problem
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
A taxonomy of privacy-preserving record linkage techniques
Information Systems
Discovering interesting information with advances in web technology
ACM SIGKDD Explorations Newsletter
A supervised learning and group linking method for historical census household linkage
AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
SIGMa: simple greedy matching for aligning large knowledge bases
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Optimal hashing schemes for entity matching
Proceedings of the 22nd international conference on World Wide Web
Large-scale multimedia content analysis using scientific workflows
Proceedings of the 21st ACM international conference on Multimedia
Inferring anchor links across multiple heterogeneous social networks
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
De-duplication of aggregation authority files
International Journal of Metadata, Semantics and Ontologies
Efficient entity matching using materialized lists
Information Sciences: an International Journal
Incremental entity resolution on rules and data
The VLDB Journal — The International Journal on Very Large Data Bases
Joint entity resolution on multiple datasets
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
Many databases contain uncertain and imprecise references to real-world entities. The absence of identifiers for the underlying entities often results in a database which contains multiple references to the same entity. This can lead not only to data redundancy, but also inaccuracies in query processing and knowledge extraction. These problems can be alleviated through the use of entity resolution. Entity resolution involves discovering the underlying entities and mapping each database reference to these entities. Traditionally, entities are resolved using pairwise similarity over the attributes of references. However, there is often additional relational information in the data. Specifically, references to different entities may cooccur. In these cases, collective entity resolution, in which entities for cooccurring references are determined jointly rather than independently, can improve entity resolution accuracy. We propose a novel relational clustering algorithm that uses both attribute and relational information for determining the underlying domain entities, and we give an efficient implementation. We investigate the impact that different relational similarity measures have on entity resolution quality. We evaluate our collective entity resolution algorithm on multiple real-world databases. We show that it improves entity resolution performance over both attribute-based baselines and over algorithms that consider relational information but do not resolve entities collectively. In addition, we perform detailed experiments on synthetically generated data to identify data characteristics that favor collective relational resolution over purely attribute-based algorithms.