Randomized algorithms
The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Size-estimation framework with applications to transitive closure and reachability
Journal of Computer and System Sciences
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Multidimensional access methods
ACM Computing Surveys (CSUR)
Approximating matrix multiplication for pattern recognition tasks
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Data integration using similarity joins and a word-based information representation language
ACM Transactions on Information Systems (TOIS)
Modern Information Retrieval
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
Searching in metric spaces by spatial approximation
The VLDB Journal — The International Journal on Very Large Data Bases
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient set joins on similarity predicates
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Mining complex matchings across Web query interfaces
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Iterative record linkage for cleaning and integration
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Mining reference tables for automatic text segmentation
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering complex matchings across web query interfaces: a correlation mining approach
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Measuring similarity between collection of values
Proceedings of the 6th annual ACM international workshop on Web information and data management
Robust Identification of Fuzzy Duplicates
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Comparative study of name disambiguation problem using a scalable blocking-based framework
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
SPIDER: flexible matching in databases
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Data cleaning in microsoft SQL server 2005
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Exploiting relationships for object consolidation
Proceedings of the 2nd international workshop on Information quality in information systems
Effective and scalable solutions for mixed and split citation problems in digital libraries
Proceedings of the 2nd international workshop on Information quality in information systems
Selectivity estimation for fuzzy string predicates in large data sets
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Indexing mixed types for approximate retrieval
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Relational clustering for multi-type entity resolution
MRDM '05 Proceedings of the 4th international workshop on Multi-relational mining
Automatically utilizing secondary sources to align information across sources
AI Magazine - Special issue on semantic integration
Establishing value mappings using statistical models and user feedback
Proceedings of the 14th ACM international conference on Information and knowledge management
Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Mining Adaptive Ratio Rules from Distributed Data Sources
Data Mining and Knowledge Discovery
Automatic complex schema matching across Web query interfaces: A correlation mining approach
ACM Transactions on Database Systems (TODS)
Domain-independent data cleaning via analysis of entity-relationship graph
ACM Transactions on Database Systems (TODS)
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
To search or to crawl?: towards a query optimizer for text-centric tasks
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A deferred cleansing method for RFID data analytics
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Multi-column substring matching for database schema translation
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficient exact set-similarity joins
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Collective entity resolution in relational data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Data quality awareness: a case study for cost optimal association rule mining
Knowledge and Information Systems - Special Issue on Mining Low-Quality Data
Benchmarking declarative approximate selection predicates
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Leveraging aggregate constraints for deduplication
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Adaptive sorted neighborhood methods for efficient record linkage
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Management of probabilistic data: foundations and challenges
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Towards automated record linkage
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Towards automatic identification of completeness and consistency in digital dossiers
Proceedings of the 11th international conference on Artificial intelligence and law
Towards a query optimizer for text-centric tasks
ACM Transactions on Database Systems (TODS)
Merging the results of approximate match operations
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Leveraging semantic technologies for enterprise search
Proceedings of the ACM first Ph.D. workshop in CIKM
Proceedings of the 9th annual ACM international workshop on Web information and data management
Management of data with uncertainties
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A strategy for allowing meaningful and comparable scores in approximate matching
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Extending q-grams to estimate selectivity of string matching with low edit distance
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Example-driven design of efficient record matching queries
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Randomized algorithms for data reconciliation in wide area aggregate query processing
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Bridging the application and DBMS profiling divide for database application developers
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Robust location search from text queries
Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems
Replica identification using genetic programming
Proceedings of the 2008 ACM symposium on Applied computing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Sampling cube: a framework for statistical olap over sampling data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Finding frequent items in probabilistic data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Building a global location search service
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SEPIA: estimating selectivities of approximate string predicates in large Databases
The VLDB Journal — The International Journal on Very Large Data Bases
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Efficient Similarity Search for Tree-Structured Data
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
A dynamic data structure for top-k queries on uncertain data
Theoretical Computer Science
Learning to hash: forgiving hash functions and applications
Data Mining and Knowledge Discovery
Social recommendations of content and metadata
Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Foundations and Trends in Databases
Uma abordagem efetiva e eficiente para deduplicação de metadados bibliográficos de objetos digitais
SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
The impact of parameter setup on a genetic programming approach to record deduplication
SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
Automatic threshold estimation for data matching applications
SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
Efficient top-k count queries over imprecise duplicates
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Approximate substring selectivity estimation
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Generalized Mongue-Elkan Method for Approximate Text String Comparison
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Swoosh: a generic approach to entity resolution
The VLDB Journal — The International Journal on Very Large Data Bases
A grammar-based entity representation framework for data cleaning
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Ranking distributed probabilistic data
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Optimal Stopping: A Record-Linkage Approach
Journal of Data and Information Quality (JDIQ)
A strategy for allowing meaningful and comparable scores in approximate matching
Information Systems
A strategy for allowing meaningful and comparable scores in approximate matching
Information Systems
Phoebus: a system for extracting and integrating data from unstructured and ungrammatical sources
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Linking social networks on the web with FOAF: a semantic web case study
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Journal of Artificial Intelligence Research
Creating relational data from unstructured and ungrammatical data sources
Journal of Artificial Intelligence Research
Semantic annotation of unstructured and ungrammatical text
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
The trichotomy of HAVING queries on a probabilistic database
The VLDB Journal — The International Journal on Very Large Data Bases
Creating probabilistic databases from duplicated data
The VLDB Journal — The International Journal on Very Large Data Bases
Space-economical partial gram indices for exact substring matching
Proceedings of the 18th ACM conference on Information and knowledge management
Record linkage performance for large data sets
Proceedings of the ACM first international workshop on Privacy and anonymity for very large databases
Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Automatic accuracy assessment via hashing in multiple-source environment
Expert Systems with Applications: An International Journal
Answering table augmentation queries from unstructured lists on the web
Proceedings of the VLDB Endowment
Mining Heterogeneous Information Networks by Exploring the Power of Links
DS '09 Proceedings of the 12th International Conference on Discovery Science
Entity-aware query processing for heterogeneous data with uncertainty and correlations
Proceedings of the 2009 EDBT/ICDT Workshops
An incremental clustering scheme for data de-duplication
Data Mining and Knowledge Discovery
HARRA: fast iterative hashed record linkage for large-scale data collections
Proceedings of the 13th International Conference on Extending Database Technology
Interweaving OAI-PMH data sources with the linked data cloud
International Journal of Metadata, Semantics and Ontologies
Dynamic structures for top-k queries on uncertain data
ISAAC'07 Proceedings of the 18th international conference on Algorithms and computation
Querying a super-peer in a schema-based super-peer network
DBISP2P'05/06 Proceedings of the 2005/2006 international conference on Databases, information systems, and peer-to-peer computing
PinKDD'07 Proceedings of the 1st ACM SIGKDD international conference on Privacy, security, and trust in KDD
Probabilistic string similarity joins
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Bed-tree: an all-purpose index structure for string similarity search based on edit distance
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Reverse ranking query over imprecise spatial data
Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application
On memory and I/O efficient duplication detection for multiple self-clean data sources
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
A graphical method for reference reconciliation
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
An efficient duplicate record detection using q-grams array inverted index
DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
Efficient duplicate record detection based on similarity estimation
WAIM'10 Proceedings of the 11th international conference on Web-age information management
On Graph-Based Name Disambiguation
Journal of Data and Information Quality (JDIQ)
Towards certain fixes with editing rules and master data
Proceedings of the VLDB Endowment
Exploiting content redundancy for web information extraction
Proceedings of the VLDB Endowment
Trie-join: efficient trie-based string similarity joins with edit-distance constraints
Proceedings of the VLDB Endowment
Processing of crisp and fuzzy measures in the fuzzy data warehouse for global natural resources
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part III
Context-sensitive document ranking
Journal of Computer Science and Technology
Automatic threshold estimation for data matching applications
Information Sciences: an International Journal
Foundations and Trends in Databases
Interaction between record matching and data repairing
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Faerie: efficient filtering algorithms for approximate dictionary-based entity extraction
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Neighborhood based fast graph search in large networks
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
PG-Skip: proximity graph based clustering of long strings
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
Efficient similarity joins for near-duplicate detection
ACM Transactions on Database Systems (TODS)
An unsupervised heuristic-based approach for bibliographic metadata deduplication
Information Processing and Management: an International Journal
A truly dynamic data structure for top-k queries on uncertain data
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Efficient fuzzy full-text type-ahead search
The VLDB Journal — The International Journal on Very Large Data Bases
Applied Intelligence
Context-based entity description rule for entity resolution
Proceedings of the 20th ACM international conference on Information and knowledge management
Pass-join: a partition-based method for similarity joins
Proceedings of the VLDB Endowment
Models and indices for integrating unstructured data with a relational database
KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases
Attribute and object selection queries on objects with probabilistic attributes
ACM Transactions on Database Systems (TODS)
Identifying value mappings for data integration: an unsupervised approach
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Virtual integration of existing web databases for the genotypic selection of cereal cultivars
ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I
SC spectra: a linear-time soft cardinality approximation for text comparison
MICAI'11 Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II
Towards certain fixes with editing rules and master data
The VLDB Journal — The International Journal on Very Large Data Bases
Can we beat the prefix filtering?: an adaptive framework for similarity join and search
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
InfoGather: entity augmentation and attribute discovery by holistic matching with web tables
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Efficient range queries over uncertain strings
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Entity resolution: theory, practice & open challenges
Proceedings of the VLDB Endowment
Matching product titles using web-based enrichment
Proceedings of the 21st ACM international conference on Information and knowledge management
Set-Similarity joins based semi-supervised sentiment analysis
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part I
Adaptive Connection Strength Models for Relationship-Based Entity Resolution
Journal of Data and Information Quality (JDIQ) - Special Issue on Entity Resolution
Towards Comparative Mining of Web Document Objects with NFA: WebOMiner System
International Journal of Data Warehousing and Mining
A semantic web based gazetteer model for VGI
Proceedings of the 1st ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information
Proceedings of the Joint EDBT/ICDT 2013 Workshops
Approximate string matching by position restricted alignment
Proceedings of the Joint EDBT/ICDT 2013 Workshops
Analysis and optimization for boolean expression indexing
ACM Transactions on Database Systems (TODS)
A partition-based method for string similarity joins with edit-distance constraints
ACM Transactions on Database Systems (TODS)
FusionDB: conflict management system for small-science databases
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Extending string similarity join to tolerant fuzzy token matching
ACM Transactions on Database Systems (TODS)
Scalable column concept determination for web tables using large knowledge bases
Proceedings of the VLDB Endowment
Entity resolution for distributed probabilistic data
Distributed and Parallel Databases
Top-k entities query processing on uncertainly fused multi-sensory data
Personal and Ubiquitous Computing
Hi-index | 0.00 |
To ensure high data quality, data warehouses must validate and cleanse incoming data tuples from external sources. In many situations, clean tuples must match acceptable tuples in reference tables. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation.A significant challenge in such a scenario is to implement an efficient and accurate fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any tuple in the reference relation. In this paper, we propose a new similarity function which overcomes limitations of commonly used similarity functions, and develop an efficient fuzzy match algorithm. We demonstrate the effectiveness of our techniques by evaluating them on real datasets.