Heuristics: intelligent search strategies for computer problem solving
Heuristics: intelligent search strategies for computer problem solving
Principles of artificial intelligence
Principles of artificial intelligence
Automatic text processing
The effectiveness of a nonsyntatic approach to automatic phrase indexing for document retrieval
Journal of the American Society for Information Science
Editorial: Advice to Machine Learning Authors
Machine Learning
Linear-space best-first search
Artificial Intelligence
SPIDER: a multiuser information retrieval system for semistructured and dynamic data
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic combination of multiple ranked retrieval systems
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic Datalog—a logic for powerful retrieval methods
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Query evaluation: strategies and optimizations
Information Processing and Management: an International Journal
The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Join queries with external text sources: execution and optimization techniques
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Context-sensitive learning methods for text categorization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
InfoSleuth: agent-based semantic integration of information in open and dynamic environments
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The distributed information search component (Disco) and the World Wide Web
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
Answering recursive queries using views
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Regular path queries with constraints
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Management of semistructured data
ACM SIGMOD Record
Fuzzy queries in multimedia database systems
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A Web-based information system that reasons with structured collections of text
AGENTS '98 Proceedings of the second international conference on Autonomous agents
Learning to extract symbolic knowledge from the World Wide Web
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
SAC '97 Proceedings of the 1997 ACM symposium on Applied computing
The Management of Probabilistic Data
IEEE Transactions on Knowledge and Data Engineering
W3QS: A Query System for the World-Wide Web
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Querying Heterogeneous Information Sources Using Source Descriptions
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Query Decomposition and View Maintenance for Query Languages for Unstructured Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Representation and Learning in Information Retrieval
Representation and Learning in Information Retrieval
Query-answering algorithms for information agents
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
GESS: a scalable similarity-join algorithm for mining large data sets in high dimensional spaces
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to match and cluster large high-dimensional data sets for data integration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Finding similar identities among objects from multiple web sources
WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
An interactive clustering-based approach to integrating source query interfaces on the deep Web
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Efficient set joins on similarity predicates
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Iterative record linkage for cleaning and integration
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Measuring similarity between collection of values
Proceedings of the 6th annual ACM international workshop on Web information and data management
Deriving marketing intelligence from online discussion
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Towards automatic association of relevant unstructured content with structured query results
Proceedings of the 14th ACM international conference on Information and knowledge management
Linking messages and form requests
Proceedings of the 11th international conference on Intelligent user interfaces
ENC '05 Proceedings of the Sixth Mexican International Conference on Computer Science
Adaptive Name Matching in Information Integration
IEEE Intelligent Systems
Imprecise RDQL: towards generic retrieval in ontologies using similarity joins
Proceedings of the 2006 ACM symposium on Applied computing
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Contextual search and name disambiguation in email using graphs
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Collective entity resolution in relational data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Querying and browsing XML and relational data sources
Proceedings of the 2007 ACM symposium on Applied computing
Mining Software Repositories with iSPAROL and a Software Evolution Ontology
MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
Example-driven design of efficient record matching queries
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
An Exploratory Study of Database Integration Processes
IEEE Transactions on Knowledge and Data Engineering
Survey on test collections and techniques for personal name matching
International Journal of Metadata, Semantics and Ontologies
Replica identification using genetic programming
Proceedings of the 2008 ACM symposium on Applied computing
Semantic text similarity using corpus-based word similarity and string similarity
ACM Transactions on Knowledge Discovery from Data (TKDD)
SimEval: a tool for evaluating the quality of similarity functions
ER '07 Tutorials, posters, panels and industrial contributions at the 26th international conference on Conceptual modeling - Volume 83
Augmenting Data Retrieval with Information Retrieval Techniques by Using Word Similarity
NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
Theories of meaning in schema matching: An exploratory study
Information Systems
Semantic Filtering for DDL-Based Service Composition
PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Efficient top-k count queries over imprecise duplicates
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Refining Keyword Queries for XML Retrieval by Combining Content and Structure
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Swoosh: a generic approach to entity resolution
The VLDB Journal — The International Journal on Very Large Data Bases
A grammar-based entity representation framework for data cleaning
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Combining a Logical and a Numerical Method for Data Reconciliation
Journal on Data Semantics XII
Optimal Stopping: A Record-Linkage Approach
Journal of Data and Information Quality (JDIQ)
Creating relational data from unstructured and ungrammatical data sources
Journal of Artificial Intelligence Research
Deploying information agents on the web
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Semantic annotation of unstructured and ungrammatical text
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
A possibilistic approach to string comparison
IEEE Transactions on Fuzzy Systems
Multiple information sources cooperative learning
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
On active learning of record matching packages
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Properties of possibilistic string comparison
IEEE Transactions on Fuzzy Systems
Duplicate identification in deep web data integration
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Efficient duplicate record detection based on similarity estimation
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Privacy-preserving record linkage
PSD'10 Proceedings of the 2010 international conference on Privacy in statistical databases
Collective extraction from heterogeneous web lists
Proceedings of the fourth ACM international conference on Web search and data mining
Applied Intelligence
A string metric for ontology alignment
ISWC'05 Proceedings of the 4th international conference on The Semantic Web
Effective early termination techniques for text similarity join operator
ISCIS'05 Proceedings of the 20th international conference on Computer and Information Sciences
Querying the semantic web with preferences
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Estimating recall and precision for vague queries in databases
CAiSE'05 Proceedings of the 17th international conference on Advanced Information Systems Engineering
First-Order patterns for information integration
ICWE'05 Proceedings of the 5th international conference on Web Engineering
Ontology-based concept similarity in Formal Concept Analysis
Information Sciences: an International Journal
Database enrichment environment to identify duplicate tuples
FDIA'11 Proceedings of the Fourth BCS-IRSG conference on Future Directions in Information Access
Proximity search of XML data using ontology and XPath edit similarity
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Matching points of interest from different social networking sites
KI'12 Proceedings of the 35th Annual German conference on Advances in Artificial Intelligence
An evolutionary approach to complex schema matching
Information Systems
Matching Attributes across Overlapping Heterogeneous Data Sources Using Mutual Information
Journal of Database Management
A taxonomy of privacy-preserving record linkage techniques
Information Systems
Programming with personalized pagerank: a locally groundable first-order probabilistic logic
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
The integration of distributed, heterogeneous databases, such as those available on the World Wide Web, poses many problems. Herer we consider the problem of integrating data from sources that lack common object identifiers. A solution to this problem is proposed for databases that contain informal, natural-language “names” for objects; most Web-based databases satisfy this requirement, since they usually present their information to the end-user through a veneer of text. We describe WHIRL, a “soft” database management system which supports “similarity joins,” based on certain robust, general-purpose similarity metrics for text. This enables fragments of text (e.g., informal names of objects) to be used as keys. WHIRL includes textual objects as a built-in type, similarity reasoning as a built-in predicate, and answers every query with a list of answer substitutions that are ranked according to an overall score. Experiments show that WHIRL is much faster than naive inference methods, even for short queries, and efficient on typical queries to real-world databases with tens of thousands of tuples. Inferences made by WHIRL are also surprisingly accurate, equaling the accuracy of hand-coded normalization routines on one benchmark problem, and outerperforming exact matching with a plausible global domain on a second.