Structured induction in expert systems
Structured induction in expert systems
Handbook of record linkage: methods for health and statistical studies, administration, and business
Handbook of record linkage: methods for health and statistical studies, administration, and business
Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems
Original Contribution: Stacked generalization
Neural Networks
C4.5: programs for machine learning
C4.5: programs for machine learning
String searching algorithms
Machine Learning
Rule based joins in heterogeneous databases
Decision Support Systems - Special issue on information technologies and systems
Machine Learning
Identifying object isomerism in multidatabase systems
Distributed and Parallel Databases
Communications of the ACM
Automating the approximate record-matching process
Information Sciences—Informatics and Computer Science: An International Journal
Machine Learning
Matching records in a national medical patient index
Communications of the ACM
Principles of data mining
Learning object identification rules for information integration
Information Systems - Data extraction, cleaning and reconciliation
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey
Data Mining and Knowledge Discovery
A Distance-Based Approach to Entity Reconciliation in Heterogeneous Databases
IEEE Transactions on Knowledge and Data Engineering
Enhancing information systems management with natural language processing techniques
Data & Knowledge Engineering - DKE 40
A Unifeid Bias-Variance Decomposition and its Applications
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Reducing Inconsistency in Integrating Data From Different Sources
IDEAS '01 Proceedings of the International Database Engineering & Applications Symposium
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Methods for precise named entity matching in digital collections
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Record Matching in Data Warehouses: A Decision Model for Data Consolidation
Operations Research
Data & Knowledge Engineering - NLDB2002
Element matching across data-oriented XML sources using a multi-strategy clustering model
Data & Knowledge Engineering
An interactive clustering-based approach to integrating source query interfaces on the deep Web
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Iterative record linkage for cleaning and integration
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Learning phonetic similarity for matching named entity translations and mining new translations
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Robust Identification of Fuzzy Duplicates
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Journal of the American Society for Information Science and Technology
Profile-Based Object Matching for Information Integration
IEEE Intelligent Systems
Semantic matching across heterogeneous data sources
Communications of the ACM - The patent holder's dilemma: buy, sell, or troll?
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Combining schema and instance information for integrating heterogeneous data sources
Data & Knowledge Engineering
Journal of Management Information Systems
Constrained Cascade Generalization of Decision Trees
IEEE Transactions on Knowledge and Data Engineering
A study of cross-validation and bootstrap for accuracy estimation and model selection
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
An Efficient Algorithm for Generating Generalized Decision Forests
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
The Normalized Compression Distance as a Distance Measure in Entity Identification
ICDM '09 Proceedings of the 9th Industrial Conference on Advances in Data Mining. Applications and Theoretical Aspects
Frameworks for entity matching: A comparison
Data & Knowledge Engineering
Editorial: Acquiring knowledge from inconsistent data sources through weighting
Data & Knowledge Engineering
Matching Attributes across Overlapping Heterogeneous Data Sources Using Mutual Information
Journal of Database Management
Hi-index | 0.02 |
To integrate or link the data stored in heterogeneous data sources, a critical problem is entity matching, i.e., matching records representing semantically corresponding entities in the real world, across the sources. While decision tree techniques have been used to learn entity matching rules, most decision tree learners have an inherent representational bias, that is, they generate univariate trees and restrict the decision boundaries to be axis-orthogonal hyper-planes in the feature space. Cascading other classification methods with decision tree learners can alleviate this bias and potentially increase classification accuracy. In this paper, the authors apply a recently-developed constrained cascade generalization method in entity matching and report on empirical evaluation using real-world data. The evaluation results show that this method outperforms the base classification methods in terms of classification accuracy, especially in the dirtiest case.