Automatic text processing
The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Learning to extract symbolic knowledge from the World Wide Web
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
AJAX: an extensible data cleaning tool
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
WHIRL: a word-based information representation language
Artificial Intelligence - Special issue on Intelligent internet systems
Data integration using similarity joins and a word-based information representation language
ACM Transactions on Information Systems (TOIS)
Automating the Construction of Internet Portals with Machine Learning
Information Retrieval
Reasoning about Textual Similarity in a Web-Based Information Access System
Autonomous Agents and Multi-Agent Systems
Journal of Artificial Intelligence Research
Text joins in an RDBMS for web data integration
WWW '03 Proceedings of the 12th international conference on World Wide Web
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning
Iterative record linkage for cleaning and integration
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Methods for evaluating and creating data quality
Information Systems - Special issue: Data quality in cooperative information systems
A hierarchical graphical model for record linkage
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Robust Identification of Fuzzy Duplicates
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Exploiting relationships for object consolidation
Proceedings of the 2nd international workshop on Information quality in information systems
Relational clustering for multi-type entity resolution
MRDM '05 Proceedings of the 4th international workshop on Multi-relational mining
Semantic integration in text: from ambiguous names to identifiable entities
AI Magazine - Special issue on semantic integration
Semantic-integration research in the database community
AI Magazine - Special issue on semantic integration
Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A Heterogeneous Field Matching Method for Record Linkage
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Profile-Based Object Matching for Information Integration
IEEE Intelligent Systems
Domain-independent data cleaning via analysis of entity-relationship graph
ACM Transactions on Database Systems (TODS)
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior
The Journal of Machine Learning Research
Collective entity resolution in relational data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Discover the semantic topology in high-dimensional data
Expert Systems with Applications: An International Journal
Integration of Ontology Data through Learning Instance Matching
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Adaptive graphical approach to entity resolution
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
A novel approach to clustering merchandise records
Journal of Computer Science and Technology
Structure-based inference of xml similarity for fuzzy duplicate detection
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Example-driven design of efficient record matching queries
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Replica identification using genetic programming
Proceedings of the 2008 ACM symposium on Applied computing
A two-step classification approach to unsupervised record linkage
AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
Identification of time-varying objects on the web
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Febrl: a freely available record linkage system with a graphical user interface
HDKM '08 Proceedings of the second Australasian workshop on Health data and knowledge management - Volume 80
Automatic record linkage using seeded nearest neighbour and support vector machine classification
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
An ontology data matching method for web information integration
Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Uma abordagem efetiva e eficiente para deduplicação de metadados bibliográficos de objetos digitais
SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
The impact of parameter setup on a genetic programming approach to record deduplication
SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
Learning to Extract Relations for Relational Classification
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Exploiting context analysis for combining multiple entity resolution systems
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Combining a Logical and a Numerical Method for Data Reconciliation
Journal on Data Semantics XII
Optimal Stopping: A Record-Linkage Approach
Journal of Data and Information Quality (JDIQ)
Identification and tracing of ambiguous names: discriminative and generative approaches
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Learnable similarity functions and their applications to clustering and record linkage
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Learning blocking schemes for record linkage
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Bounding and comparing methods for correlation clustering beyond ILP
ILP '09 Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing
Learning to match names across languages
MMIES '08 Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization
Similarity-aware indexing for real-time entity resolution
Proceedings of the 18th ACM conference on Information and knowledge management
Record linkage performance for large data sets
Proceedings of the ACM first international workshop on Privacy and anonymity for very large databases
ACM SIGKDD Explorations Newsletter
Reasoning about record matching rules
Proceedings of the VLDB Endowment
Discriminative training of clustering functions: theory and experiments with entity identification
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
An incremental clustering scheme for data de-duplication
Data Mining and Knowledge Discovery
Learning similarity metrics for event identification in social media
Proceedings of the third ACM international conference on Web search and data mining
A constrained clustering approach to duplicate detection among relational data
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Scaling record linkage to non-uniform distributed class sizes
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Improved consensus clustering via linear programming
ACSC '10 Proceedings of the Thirty-Third Australasian Conferenc on Computer Science - Volume 102
Correlation clustering with noisy input
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
An efficient duplicate record detection using q-grams array inverted index
DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
A supervised machine learning approach for duplicate detection over gazetteer records
GeoS'11 Proceedings of the 4th international conference on GeoSpatial semantics
An unsupervised heuristic-based approach for bibliographic metadata deduplication
Information Processing and Management: an International Journal
Public record aggregation using semi-supervised entity resolution
Proceedings of the 13th International Conference on Artificial Intelligence and Law
Entity matching: how similar is similar
Proceedings of the VLDB Endowment
The Journal of Machine Learning Research
Dynamic constraints for record matching
The VLDB Journal — The International Journal on Very Large Data Bases
Learning top-k transformation rules
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Adjusting Fuzzy Similarity Functions for use with standard data mining tools
Journal of Systems and Software
Duplicate detection through structure optimization
Proceedings of the 20th ACM international conference on Information and knowledge management
Identifying co-referential names across large corpora
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Object identification with attribute-mediated dependences
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Probabilistic data generation for deduplication and data linkage
IDEAL'05 Proceedings of the 6th international conference on Intelligent Data Engineering and Automated Learning
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Probabilistic iterative duplicate detection
OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, COA, and ODBASE - Volume Part II
Unsupervised duplicate detection using sample non-duplicates
Journal on Data Semantics VII
Similarity function recommender service using incremental user knowledge acquisition
ICSOC'11 Proceedings of the 9th international conference on Service-Oriented Computing
Cross-lingual knowledge linking across wiki knowledge bases
Proceedings of the 21st international conference on World Wide Web
Aggregate queries on probabilistic record linkages
Proceedings of the 15th International Conference on Extending Database Technology
Integrating community matching and outlier detection for mining evolutionary community outliers
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
De-duplication of aggregation authority files
International Journal of Metadata, Semantics and Ontologies
Computer Methods and Programs in Biomedicine
Towards scalable real-time entity resolution using a similarity-aware inverted index approach
AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
A taxonomy of privacy-preserving record linkage techniques
Information Systems
Learning to extract cross-session search tasks
Proceedings of the 22nd international conference on World Wide Web
A distributed framework for scaling Up LSH-based computations in privacy preserving record linkage
Proceedings of the 6th Balkan Conference in Informatics
De-duplication of aggregation authority files
International Journal of Metadata, Semantics and Ontologies
Evaluation of instance matching tools: The experience of OAEI
Web Semantics: Science, Services and Agents on the World Wide Web
Linkage of compound objects for supporting maintenance of large-scale web sites
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Efficient entity matching using materialized lists
Information Sciences: an International Journal
Towards a Protein-Protein Interaction information extraction system: Recognizing named entities
Knowledge-Based Systems
Hi-index | 0.00 |
Part of the process of data integration is determining which sets of identifiers refer to the same real-world entities. In integrating databases found on the Web or obtained by using information extraction methods, it is often possible to solve this problem by exploiting similarities in the textual names used for objects in different databases. In this paper we describe techniques for clustering and matching identifier names that are both scalable and adaptive, in the sense that they can be trained to obtain better performance in a particular domain. An experimental evaluation on a number of sample datasets shows that the adaptive method sometimes performs much better than either of two non-adaptive baseline systems, and is nearly always competitive with the best baseline system.