A comparative analysis of methodologies for database schema integration
ACM Computing Surveys (CSUR)
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Elements of machine learning
Machine Learning
Beyond market baskets: generalizing association rules to correlations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Rank aggregation methods for the Web
Proceedings of the 10th international conference on World Wide Web
Probe, count, and classify: categorizing hidden web databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Reconciling schemas of disparate data sources: a machine-learning approach
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Alternative Interest Measures for Mining Associations in Databases
IEEE Transactions on Knowledge and Data Engineering
Generic Schema Matching with Cupid
Proceedings of the 27th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Selecting the right interestingness measure for association patterns
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical schema matching across web query interfaces
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient similarity search and classification via rank aggregation
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
CoMine: Efficient Mining of Correlated Patterns
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
An interactive clustering-based approach to integrating source query interfaces on the deep Web
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Understanding Web query interfaces: best-effort parsing with hidden syntax
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
iMAP: discovering complex semantic matches between database schemas
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Discovering complex matchings across web query interfaces: a correlation mining approach
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Structured databases on the web: observations and implications
ACM SIGMOD Record
Making holistic schema matching robust: an ensemble approach
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Light-weight domain-based form assistant: querying web databases on the fly
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Wise-integrator: an automatic integrator of web search interfaces for E-commerce
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Instance-based schema matching for web databases by domain-specific query probing
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Probabilistic top-k and ranking-aggregate queries
ACM Transactions on Database Systems (TODS)
Matching large ontologies: A divide-and-conquer approach
Data & Knowledge Engineering
Proceedings of the VLDB Endowment
Ontology Mapping Between Heterogeneous Product Taxonomies in an Electronic Commerce Environment
International Journal of Electronic Commerce
ODE: Ontology-assisted data extraction
ACM Transactions on Database Systems (TODS)
Improving XML schema matching performance using Prüfer sequences
Data & Knowledge Engineering
A large dataset for the evaluation of ontology matching
The Knowledge Engineering Review
A Prioritized Collective Selection Strategy for Schema Matching across Query Interfaces
BNCOD 26 Proceedings of the 26th British National Conference on Databases: Dataspace: The Final Frontier
Combining Similarity and Distribution Features to Match Attributes
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Category mapping for the automatic integration of category-constrained web search
International Journal of Business Intelligence and Data Mining
Automated Ontology-Driven Metasearch Generation with Metamorph
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Semantic matching: algorithms and implementation
Journal on data semantics IX
Toward boosting distributed association rule mining by data de-clustering
Information Sciences: an International Journal
PruSM: a prudent schema matching approach for web forms
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
The ICoP Framework: identification of correspondences between process models
CAiSE'10 Proceedings of the 22nd international conference on Advanced information systems engineering
A concept hierarchy based ontology mapping approach
KSEM'10 Proceedings of the 4th international conference on Knowledge science, engineering and management
Frontiers of Computer Science in China
On-line web database integration
Proceedings of the International Conference on Management of Emergent Digital EcoSystems
Deep web sources classifier based on DSOM-EACO clustering model
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Deep Web adaptive crawling based on minimum executable pattern
Journal of Intelligent Information Systems
Synthesizing products for online catalogs
Proceedings of the VLDB Endowment
Reuse-oriented mapping discovery for meta-querier customization
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Unsupervised transactional query classification based on webpage form understanding
Proceedings of the 20th ACM international conference on Information and knowledge management
Multilingual schema matching for Wikipedia infoboxes
Proceedings of the VLDB Endowment
Making sense of top-k matchings: a unified match graph for schema matching
Proceedings of the Ninth International Workshop on Information Integration on the Web
Learning complex mappings between ontologies
JIST'11 Proceedings of the 2011 joint international conference on The Semantic Web
Towards a More Scalable Schema Matching: A Novel Approach
International Journal of Distributed Systems and Technologies
Matching Attributes across Overlapping Heterogeneous Data Sources Using Mutual Information
Journal of Database Management
Hierarchical directory mapping for category-constrained meta-search
Journal of Intelligent Information Systems
Hi-index | 0.00 |
To enable information integration, schema matching is a critical step for discovering semantic correspondences of attributes across heterogeneous sources. While complex matchings are common, because of their far more complex search space, most existing techniques focus on simple 1:1 matchings. To tackle this challenge, this article takes a conceptually novel approach by viewing schema matching as correlation mining, for our task of matching Web query interfaces to integrate the myriad databases on the Internet. On this “deep Web ” query interfaces generally form complex matchings between attribute groups (e.g., {author} corresponds to {first name, last name} in the Books domain). We observe that the co-occurrences patterns across query interfaces often reveal such complex semantic relationships: grouping attributes (e.g., {first name, last name}) tend to be co-present in query interfaces and thus positively correlated. In contrast, synonym attributes are negatively correlated because they rarely co-occur. This insight enables us to discover complex matchings by a correlation mining approach. In particular, we develop the DCM framework, which consists of data preprocessing, dual mining of positive and negative correlations, and finally matching construction. We evaluate the DCM framework on manually extracted interfaces and the results show good accuracy for discovering complex matchings. Further, to automate the entire matching process, we incorporate automatic techniques for interface extraction. Executing the DCM framework on automatically extracted interfaces, we find that the inevitable errors in automatic interface extraction may significantly affect the matching result. To make the DCM framework robust against such “noisy” schemas, we integrate it with a novel “ensemble” approach, which creates an ensemble of DCM matchers, by randomizing the schema data into many trials and aggregating their ranked results by taking majority voting. As a principled basis, we provide analytic justification of the robustness of the ensemble approach. Empirically, our experiments show that the “ensemblization” indeed significantly boosts the matching accuracy, over automatically extracted and thus noisy schema data. By employing the DCM framework with the ensemble approach, we thus complete an automatic process of matchings Web query interfaces.