SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
An adaptive query execution system for data integration
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
RQL: a declarative query language for RDF
Proceedings of the 11th international conference on World Wide Web
Modern Information Retrieval
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Towards a theory of natural language interfaces to databases
Proceedings of the 8th international conference on Intelligent user interfaces
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Text joins in an RDBMS for web data integration
WWW '03 Proceedings of the 12th international conference on World Wide Web
XRANK: ranked keyword search over XML documents
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Keyword Searching and Browsing in Databases using BANKS
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Lineage tracing in data warehouses
Lineage tracing in data warehouses
BEA Liquid Data for WebLogic: XML-Based Enterprise Information Integration
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Evaluating top-k queries over web-accessible databases
ACM Transactions on Database Systems (TODS)
RankSQL: query algebra and optimization for relational top-k queries
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Bidirectional expansion for keyword search on graph databases
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Finding and approximating top-k answers in keyword proximity search
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Online Passive-Aggressive Algorithms
The Journal of Machine Learning Research
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Discover: keyword search in relational databases
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Objectrank: authority-based keyword search in databases
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient query evaluation on probabilistic databases
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Update exchange with mappings and provenance
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Bioinformatics
NAGA: harvesting, searching and ranking knowledge
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Proceedings of the 13th international conference on Intelligent user interfaces
Explaining and Reformulating Authority Flow Queries
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
The ORCHESTRA Collaborative Data Sharing System
ACM SIGMOD Record
Keyword search on structured and semi-structured data
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
BioBrowsing: Making the Most of the Data Available in Entrez
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Data Integration and Exchange for Scientific Collaboration
DILS '09 Proceedings of the 6th International Workshop on Data Integration in the Life Sciences
A unified approach to ranking in probabilistic databases
Proceedings of the VLDB Endowment
Feedback-based annotation, selection and refinement of schema mappings for dataspaces
Proceedings of the 13th International Conference on Extending Database Technology
Automatically incorporating new sources in keyword search-based data integration
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A unified approach to ranking in probabilistic databases
The VLDB Journal — The International Journal on Very Large Data Bases
Finding a minimal tree pattern under neighborhood constraints
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sharing work in keyword search over databases
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Pay-as-you-go mapping selection in dataspaces
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Answering complex structured queries over the deep web
Proceedings of the 15th Symposium on International Database Engineering & Applications
REX: explaining relationships between entity pairs
Proceedings of the VLDB Endowment
Search Computing
DSToolkit: an architecture for flexible dataspace management
Transactions on Large-Scale Data- and Knowledge-Centered Systems V
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Pay-as-you-go data integration for linked data: opportunities, challenges and architectures
SWIM '12 Proceedings of the 4th International Workshop on Semantic Web Information Management
Stratified k-means clustering over a deep web data source
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Querying provenance for ranking and recommending
TaPP'12 Proceedings of the 4th USENIX conference on Theory and Practice of Provenance
Pay-as-You-Go ranking of schema mappings using query logs
DILS'12 Proceedings of the 8th international conference on Data Integration in the Life Sciences
Extracting minimum-weight tree patterns from a schema with neighborhood constraints
Proceedings of the 16th International Conference on Database Theory
Actively soliciting feedback for query answers in keyword search-based data integration
Proceedings of the VLDB Endowment
Incrementally improving dataspaces based on user feedback
Information Systems
Collaborative data sharing via update exchange and provenance
ACM Transactions on Database Systems (TODS)
Hi-index | 0.00 |
The number of potentially-related data resources available for querying --- databases, data warehouses, virtual integrated schemas --- continues to grow rapidly. Perhaps no area has seen this problem as acutely as the life sciences, where hundreds of large, complex, interlinked data resources are available on fields like proteomics, genomics, disease studies, and pharmacology. The schemas of individual databases are often large on their own, but users also need to pose queries across multiple sources, exploiting foreign keys and schema mappings. Since the users are not experts, they typically rely on the existence of pre-defined Web forms and associated query templates, developed by programmers to meet the particular scientists' needs. Unfortunately, such forms are scarce commodities, often limited to a single database, and mismatched with biologists' information needs that are often context-sensitive and span multiple databases. We present a system with which a non-expert user can author new query templates and Web forms, to be reused by anyone with related information needs. The user poses keyword queries that are matched against source relations and their attributes; the system uses sequences of associations (e.g., foreign keys, links, schema mappings, synonyms, and taxonomies) to create multiple ranked queries linking the matches to keywords; the set of queries is attached to a Web query form. Now the user and his or her associates may pose specific queries by filling in parameters in the form. Importantly, the answers to this query are ranked and annotated with data provenance, and the user provides feedback on the utility of the answers, from which the system ultimately learns to assign costs to sources and associations according to the user's specific information need, as a result changing the ranking of the queries used to generate results. We evaluate the effectiveness of our method against "gold standard" costs from domain experts and demonstrate the method's scalability.