SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Reconciling schemas of disparate data sources: a machine-learning approach
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Generic Schema Matching with Cupid
Proceedings of the 27th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Text joins in an RDBMS for web data integration
WWW '03 Proceedings of the 12th international conference on World Wide Web
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Keyword Searching and Browsing in Databases using BANKS
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Evaluating top-k queries over web-accessible databases
ACM Transactions on Database Systems (TODS)
Understanding Web query interfaces: best-effort parsing with hidden syntax
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
RankSQL: query algebra and optimization for relational top-k queries
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Bidirectional expansion for keyword search on graph databases
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Contextual search and name disambiguation in email using graphs
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Matching large schemas: Approaches and evaluation
Information Systems
BLINKS: ranked keyword searches on graphs
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Online Passive-Aggressive Algorithms
The Journal of Machine Learning Research
Discover: keyword search in relational databases
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Objectrank: authority-based keyword search in databases
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Data integration with uncertainty
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Video suggestion and discovery for youtube: taking random walks through the view graph
Proceedings of the 17th international conference on World Wide Web
Pay-as-you-go user feedback for dataspace systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Toward best-effort information extraction
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Proceedings of the 13th international conference on Intelligent user interfaces
Fine-grained relevance feedback for XML retrieval
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
WebTables: exploring the power of tables on the web
Proceedings of the VLDB Endowment
Learning to create data-integrating queries
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
STAR: Steiner-Tree Approximation in Relationship Graphs
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Efficiently incorporating user feedback into information extraction and integration programs
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Weakly-supervised acquisition of labeled class instances using graph random walks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Sharing work in keyword search over databases
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Layered graph data model for data management of dataspace support platform
WAIM'11 Proceedings of the 12th international conference on Web-age information management
DSToolkit: an architecture for flexible dataspace management
Transactions on Large-Scale Data- and Knowledge-Centered Systems V
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Feedback-based data set recommendation for building linked data applications
Proceedings of the 8th International Conference on Semantic Systems
Collectively representing semi-structured data from the web
AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Using information quality for the identification of relevant web data sources: a proposal
Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
Actively soliciting feedback for query answers in keyword search-based data integration
Proceedings of the VLDB Endowment
Incrementally improving dataspaces based on user feedback
Information Systems
Big data challenge: a data management perspective
Frontiers of Computer Science: Selected Publications from Chinese Universities
Hi-index | 0.00 |
Scientific data offers some of the most interesting challenges in data integration today. Scientific fields evolve rapidly and accumulate masses of observational and experimental data that needs to be annotated, revised, interlinked, and made available to other scientists. From the perspective of the user, this can be a major headache as the data they seek may initially be spread across many databases in need of integration. Worse, even if users are given a solution that integrates the current state of the source databases, new data sources appear with new data items of interest to the user. Here we build upon recent ideas for creating integrated views over data sources using keyword search techniques, ranked answers, and user feedback [32] to investigate how to automatically discover when a new data source has content relevant to a user's view - in essence, performing automatic data integration for incoming data sets. The new architecture accommodates a variety of methods to discover related attributes, including label propagation algorithms from the machine learning community [2] and existing schema matchers [11]. The user may provide feedback on the suggested new results, helping the system repair any bad alignments or increase the cost of including a new source that is not useful. We evaluate our approach on actual bioinformatics schemas and data, using state-of-the-art schema matchers as components. We also discuss how our architecture can be adapted to more traditional settings with a mediated schema.