Algorithms for inferring functional dependencies from relations
Data & Knowledge Engineering
Approximate inference of functional dependencies from relations
ICDT '92 Selected papers of the fourth international conference on Database theory
The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
CACTUS—clustering categorical data using summaries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic segmentation of text into structured records
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Modern Information Retrieval
Efficient Discovery of Functional and Approximate Dependencies Using Partitions
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Declarative Data Cleaning: Language, Model, and Algorithms
Proceedings of the 27th International Conference on Very Large Data Bases
Potter's Wheel: An Interactive Data Cleaning System
Proceedings of the 27th International Conference on Very Large Data Bases
Generic Schema Matching with Cupid
Proceedings of the 27th International Conference on Very Large Data Bases
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Text joins in an RDBMS for web data integration
WWW '03 Proceedings of the 12th international conference on World Wide Web
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Information-theoretic tools for mining database structure from large data sets
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Efficient set joins on similarity predicates
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Iterative record linkage for cleaning and integration
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Detecting duplicate objects in XML documents
Proceedings of the 2004 international workshop on Information quality in information systems
Methods for evaluating and creating data quality
Information Systems - Special issue: Data quality in cooperative information systems
Schema Matching Using Duplicates
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Robust Identification of Fuzzy Duplicates
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Duplicate detection in click streams
WWW '05 Proceedings of the 14th international conference on World Wide Web
Comparative study of name disambiguation problem using a scalable blocking-based framework
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
DogmatiX tracks down duplicates in XML
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Exploiting relationships for object consolidation
Proceedings of the 2nd international workshop on Information quality in information systems
Effective and scalable solutions for mixed and split citation problems in digital libraries
Proceedings of the 2nd international workshop on Information quality in information systems
Selectivity estimation for fuzzy string predicates in large data sets
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Indexing mixed types for approximate retrieval
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Relational clustering for multi-type entity resolution
MRDM '05 Proceedings of the 4th international workshop on Multi-relational mining
Semantic-integration research in the database community
AI Magazine - Special issue on semantic integration
Establishing value mappings using statistical models and user feedback
Proceedings of the 14th ACM international conference on Information and knowledge management
ACM SIGKDD Explorations Newsletter
Profile-Based Object Matching for Information Integration
IEEE Intelligent Systems
Domain-independent data cleaning via analysis of entity-relationship graph
ACM Transactions on Database Systems (TODS)
Approximately detecting duplicates for streaming data using stable bloom filters
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient exact set-similarity joins
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Collective entity resolution in relational data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Editorial: Special issue on mining low-quality data
Knowledge and Information Systems - Special Issue on Mining Low-Quality Data
Benchmarking declarative approximate selection predicates
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Leveraging aggregate constraints for deduplication
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Adaptive sorted neighborhood methods for efficient record linkage
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Adaptive graphical approach to entity resolution
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Management of probabilistic data: foundations and challenges
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Leveraging semantic technologies for enterprise search
Proceedings of the ACM first Ph.D. workshop in CIKM
Proceedings of the 9th annual ACM international workshop on Web information and data management
Management of data with uncertainties
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Structure-based inference of xml similarity for fuzzy duplicate detection
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
SEPIA: estimating selectivities of approximate string predicates in large Databases
The VLDB Journal — The International Journal on Very Large Data Bases
De-duping URLs via rewrite rules
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Ontology-Driven Approximate Duplicate Elimination of Postal Addresses
IEA/AIE '08 Proceedings of the 21st international conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: New Frontiers in Applied Artificial Intelligence
Probabilistic Entity Linkage for Heterogeneous Information Spaces
CAiSE '08 Proceedings of the 20th international conference on Advanced Information Systems Engineering
A dynamic data structure for top-k queries on uncertain data
Theoretical Computer Science
Approximate lineage for probabilistic databases
Proceedings of the VLDB Endowment
Industry-scale duplicate detection
Proceedings of the VLDB Endowment
Foundations and Trends in Databases
Efficient top-k count queries over imprecise duplicates
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Incorporating cardinality constraints and synonym rules into conditional functional dependencies
Information Processing Letters
Swoosh: a generic approach to entity resolution
The VLDB Journal — The International Journal on Very Large Data Bases
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A Method for Automatic Discovery of Reference Data
IEA/AIE '09 Proceedings of the 22nd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: Next-Generation Applied Intelligence
Optimal Stopping: A Record-Linkage Approach
Journal of Data and Information Quality (JDIQ)
Improved approximate detection of duplicates for data streams over sliding windows
Journal of Computer Science and Technology
Journal of Artificial Intelligence Research
A translation model for matching reviews to objects
Proceedings of the 18th ACM conference on Information and knowledge management
Context-sensitive document ranking
Proceedings of the 18th ACM conference on Information and knowledge management
Reasoning about record matching rules
Proceedings of the VLDB Endowment
"Same, Same but Different" A Survey on Duplicate Detection Methods for Situation Awareness
OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part II
Entity-aware query processing for heterogeneous data with uncertainty and correlations
Proceedings of the 2009 EDBT/ICDT Workshops
Matching reviews to objects using a language model
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
An incremental clustering scheme for data de-duplication
Data Mining and Knowledge Discovery
Declarative XML data cleaning with XClean
CAiSE'07 Proceedings of the 19th international conference on Advanced information systems engineering
QDex: a database profiler for generic bio-data exploration and quality aware integration
WISE'07 Proceedings of the 2007 international conference on Web information systems engineering
Dynamic structures for top-k queries on uncertain data
ISAAC'07 Proceedings of the 18th international conference on Algorithms and computation
Querying a super-peer in a schema-based super-peer network
DBISP2P'05/06 Proceedings of the 2005/2006 international conference on Databases, information systems, and peer-to-peer computing
Similarity joins of text with incomplete information formats
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Self-tuning in graph-based reference disambiguation
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
On active learning of record matching packages
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
On memory and I/O efficient duplication detection for multiple self-clean data sources
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
A graphical method for reference reconciliation
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Rationality of cross-system data duplication: a case study
CAiSE'10 Proceedings of the 22nd international conference on Advanced information systems engineering
A multilevel and domain-independent duplicate detection model for scientific database
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Evaluating entity resolution results
Proceedings of the VLDB Endowment
On-the-fly entity-aware query processing in the presence of linkage
Proceedings of the VLDB Endowment
Evaluation of entity resolution approaches on real-world match problems
Proceedings of the VLDB Endowment
Large-scale collective entity matching
Proceedings of the VLDB Endowment
Context-sensitive document ranking
Journal of Computer Science and Technology
Approximate entity extraction in temporal databases
World Wide Web
Identity matching using personal and social identity features
Information Systems Frontiers
XML based framework for ETL processes for relational databases
ACOS'06 Proceedings of the 5th WSEAS international conference on Applied computer science
A set of experiments to consider data quality criteria in classification techniques for data mining
ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part II
Dynamic constraints for record matching
The VLDB Journal — The International Journal on Very Large Data Bases
Applied Intelligence
Duplicate detection through structure optimization
Proceedings of the 20th ACM international conference on Information and knowledge management
Enforcing strictness in integration of dimensions: beyond instance matching
Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP
Identifying value mappings for data integration: an unsupervised approach
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
XML duplicate detection using sorted neighborhoods
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Virtual integration of existing web databases for the genotypic selection of cereal cultivars
ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I
Probabilistic iterative duplicate detection
OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, COA, and ODBASE - Volume Part II
Unsupervised duplicate detection using sample non-duplicates
Journal on Data Semantics VII
Multiple valued logic approach for matching patient records in multiple databases
Journal of Biomedical Informatics
Similarity function recommender service using incremental user knowledge acquisition
ICSOC'11 Proceedings of the 9th international conference on Service-Oriented Computing
Linking records in dynamic world
PhD '12 Proceedings of the on SIGMOD/PODS 2012 PhD Symposium
Open business intelligence: on the importance of data quality awareness in user-friendly data mining
Proceedings of the 2012 Joint EDBT/ICDT Workshops
Discovering links among social networks
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
A machine learning approach for instance matching based on similarity metrics
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Adaptive Connection Strength Models for Relationship-Based Entity Resolution
Journal of Data and Information Quality (JDIQ) - Special Issue on Entity Resolution
ACM Transactions on Database Systems (TODS)
GRDB: a system for declarative and interactive analysis of noisy information networks
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Effective string processing and matching for author disambiguation
Proceedings of the 2013 KDD Cup 2013 Workshop
Similarity evaluation in XML schema and XLink
Proceedings of the 19th Brazilian symposium on Multimedia and the web
Query-driven approach to entity resolution
Proceedings of the VLDB Endowment
Hybrid entity clustering using crowds and data
The VLDB Journal — The International Journal on Very Large Data Bases
Efficient entity matching using materialized lists
Information Sciences: an International Journal
Journal of Information Science
Hi-index | 0.00 |
The duplicate elimination problem of detecting multiple tuples, which describe the same real world entity, is an important data cleaning problem. Previous domain independent solutions to this problem relied on standard textual similarity functions (e.g., edit distance, cosine metric) between multi-attribute tuples. However, such approaches result in large numbers of false positives if we want to identify domain-specific abbreviations and conventions. In this paper, we develop an algorithm for eliminating duplicates in dimensional tables in a data warehouse, which are usually associated with hierarchies. We exploit hierarchies to develop a high quality, scalable duplicate elimination algorithm, and evaluate it on real datasets from an operational data warehouse.