Automatically utilizing secondary sources to align information across sources

Authors:
Martin Michalowski;Snehal Thakkar;Craig A. Knoblock
Affiliations:
-;-;-
Venue:
AI Magazine - Special issue on semantic integration
Year:
2005

Citing 21
Cited 5

The merge/purge problem for large databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Bagging predictors

Machine Learning
InfoSleuth: agent-based semantic integration of information in open and dynamic environments

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Infomaster: an information integration system

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Generating finite-state transducers for semi-structured data extraction from the Web

Information Systems - Special issue on semistructured data
Learning object identification rules for information integration

Information Systems - Data extraction, cleaning and reconciliation
Hierarchical Wrapper Induction for Semistructured Information Sources

Autonomous Agents and Multi-Agent Systems
Query Learning Strategies Using Boosting and Bagging

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Schema Mapping as Query Discovery

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Potter's Wheel: An Interactive Data Cleaning System

Proceedings of the 27th International Conference on Very Large Data Bases
Querying Heterogeneous Information Sources Using Source Descriptions

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Interactive deduplication using active learning

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning domain-independent string transformation weights for high accuracy object identification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Record Linkage in Large Data Sets

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Robust and efficient fuzzy match for online data cleaning

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
iMAP: discovering complex semantic matches between database schemas

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Profile-Based Object Matching for Information Integration

IEEE Intelligent Systems
Composing mappings among data sources

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Improved use of continuous attributes in C4.5

Journal of Artificial Intelligence Research
Query-answering algorithms for information agents

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Composing, optimizing, and executing plans for bioinformatics web services

The VLDB Journal — The International Journal on Very Large Data Bases
Extracting geographic features from the Internet to automatically build detailed regional gazetteers

International Journal of Geographical Information Science
Creating relational data from unstructured and ungrammatical data sources

Journal of Artificial Intelligence Research
Semantic annotation of unstructured and ungrammatical text

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Active learning strategies for the deduplication of electronic patient data using classification trees

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

XML, web services, and the semantic web have opened the door for new and exciting information-integration applications. Information sources on the web are controlled by different organizations or people, utilize different text formats, and have varying inconsistencies. Therefore, any system that integrates information from different data sources must identify common entities from these sources. Data from many data sources on the web does not contain enough information to link the records accurately using state-of-the-art record-linkage systems. However, it is possible to exploit secondary data sources on the web to improve the record-linkage process.We present an approach to accurately and automatically match entities from various data sources by utilizing a state-of-the-art record-linkage system in conjunction with a data-integration system. The data-integration system is able to automatically determine which secondary sources need to be queried when linking records from various data sources. In turn, the record-linkage system is then able to utilize this additional information to improve the accuracy of the linkage between datasets.