MYSTIQ: a system for finding more answers by using probabilities

Authors:
Jihad Boulos;Nilesh Dalvi;Bhushan Mandhani;Shobhit Mathur;Chris Re;Dan Suciu
Affiliations:
American University of Beirut, Lebanon;University of Washington;American University of Beirut, Lebanon;American University of Beirut, Lebanon;American University of Beirut, Lebanon;American University of Beirut, Lebanon
Venue:
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Year:
2005

Citing 9
Cited 62

VAGUE: a user interface to relational databases that permits vague queries

ACM Transactions on Information Systems (TOIS)
A probabilistic relational algebra for the integration of information retrieval and database systems

ACM Transactions on Information Systems (TOIS)
ProbView: a flexible probabilistic database system

ACM Transactions on Database Systems (TODS)
The XXL search engine: ranked retrieval of XML data using indexes and ontologies

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Consistent Answers from Integrated Data Sources

FQAS '02 Proceedings of the 5th International Conference on Flexible Query Answering Systems
Web-scale information extraction in knowitall: (preliminary results)

Proceedings of the 13th international conference on World Wide Web
CORDS: automatic discovery of correlations and soft functional dependencies

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Aggregate operators in probabilistic databases

Journal of the ACM (JACM)
Efficient query evaluation on probabilistic databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

ULDBs: databases with uncertainty and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Creating probabilistic databases from information extraction models

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Magic Sets and their application to data integration

Journal of Computer and System Sciences
The dichotomy of conjunctive queries on probabilistic structures

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Management of data with uncertainties

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Materialized views in probabilistic databases: for information exchange and query optimization

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Approximating predicates and expressive queries on probabilistic databases

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Joining the results of heterogeneous search engines

Information Systems
Quality Measures in Uncertain Data Management

SUM '07 Proceedings of the 1st international conference on Scalable Uncertainty Management
Query Selectivity Estimation for Uncertain Data

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Managing Probabilistic Data with MystiQ: The Can-Do, the Could-Do, and the Can't-Do

SUM '08 Proceedings of the 2nd international conference on Scalable Uncertainty Management
Optimization of Queries over Interval Probabilistic Data

SUM '08 Proceedings of the 2nd international conference on Scalable Uncertainty Management
On the provenance of non-answers to queries over extracted data

Proceedings of the VLDB Endowment
Approximate lineage for probabilistic databases

Proceedings of the VLDB Endowment
Generating efficient safe query plans for probabilistic databases

Data & Knowledge Engineering
Information Extraction

Foundations and Trends in Databases
Computing all skyline probabilities for uncertain data

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Ranking distributed probabilistic data

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
E = MC3: managing uncertain enterprise data in a cluster-computing environment

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A Database System for Absorbing Conflicting and Uncertain Information from Multiple Correspondents

BNCOD 26 Proceedings of the 26th British National Conference on Databases: Dataspace: The Final Frontier
Towards Relational Schema Uncertainty

SUM '09 Proceedings of the 3rd International Conference on Scalable Uncertainty Management
Qualitative effects of knowledge rules and user feedback in probabilistic data integration

The VLDB Journal — The International Journal on Very Large Data Bases
Representing uncertain data: models, properties, and algorithms

The VLDB Journal — The International Journal on Very Large Data Bases
Creating probabilistic databases from duplicated data

The VLDB Journal — The International Journal on Very Large Data Bases
PrDB: managing and exploiting rich correlations in probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
Probabilistic histograms for probabilistic data

Proceedings of the VLDB Endowment
A Model for Contextual Cooperative Query Answering in E-Commerce Applications

FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
Bridging the gap between intensional and extensional query evaluation in probabilistic databases

Proceedings of the 13th International Conference on Extending Database Technology
Probabilistic data exchange

Proceedings of the 13th International Conference on Database Theory
A Survey on Uncertainty Management in Data Integration

Journal of Data and Information Quality (JDIQ)
Transducing Markov sequences

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Consistent query answers in inconsistent probabilistic databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Threshold query optimization for uncertain data

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Probabilistic string similarity joins

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
US-SQL: managing uncertain schemata

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Combining intensional with extensional query evaluation in tuple independent probabilistic databases

Information Sciences: an International Journal
Set similarity join on probabilistic data

Proceedings of the VLDB Endowment
Scalable probabilistic databases with factor graphs and MCMC

Proceedings of the VLDB Endowment
Read-once functions and query evaluation in probabilistic databases

Proceedings of the VLDB Endowment
Foundations of uncertain-data integration

Proceedings of the VLDB Endowment
Probabilistic data: a tiny survey

SUM'10 Proceedings of the 4th international conference on Scalable uncertainty management
Probabilistic inverse ranking queries in uncertain databases

The VLDB Journal — The International Journal on Very Large Data Bases
Ranking queries on uncertain data

The VLDB Journal — The International Journal on Very Large Data Bases
Asymptotically efficient algorithms for skyline probabilities of uncertain data

ACM Transactions on Database Systems (TODS)
Efficient query evaluation over probabilistic XML with long-distance dependencies

Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop
Efficient query answering in probabilistic RDF graphs

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Jigsaw: efficient optimization over uncertain enterprise data

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Fuzzy prophet: parameter exploration in uncertain enterprise scenarios

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Probabilistic data exchange

Journal of the ACM (JACM)
Database foundations for scalable RDF processing

RW'11 Proceedings of the 7th international conference on Reasoning web: semantic technologies for the web of data
Cost-efficient repair in inconsistent probabilistic databases

Proceedings of the 20th ACM international conference on Information and knowledge management
Interactive reasoning in uncertain RDF knowledge bases

Proceedings of the 20th ACM international conference on Information and knowledge management
Stochastic skylines

ACM Transactions on Database Systems (TODS)
H-Tree: a hybrid structure for confidence computation in probabilistic databases

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Probabilistic databases with MarkoViews

Proceedings of the VLDB Endowment
Deco: declarative crowdsourcing

Proceedings of the 21st ACM international conference on Information and knowledge management
Efficient processing of probabilistic group subspace skyline queries in uncertain databases

Information Systems
Ontology-based access to probabilistic data with OWL QL

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Skyline queries in crowd-enabled databases

Proceedings of the 16th International Conference on Extending Database Technology
Efficient and scalable monitoring and summarization of large probabilistic data

Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
Causality and responsibility: probabilistic queries revisited in uncertain databases

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
A temporal-probabilistic database model for information extraction

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

MystiQ is a system that uses probabilistic query semantics [3] to find answers in large numbers of data sources of less than perfect quality. There are many reasons why the data originating from many different sources may be of poor quality, and therefore difficult to query: the same data item may have different representation in different sources; the schema alignments needed by a query system are imperfect and noisy; different sources may contain contradictory information, and, in particular, their combined data may violate some global integrity constraints; fuzzy matches between objects from different sources may return false positives or negatives. Even in such environment, users some-times want to ask complex, structurally rich queries, using query constructs typically found in SQL queries: joins, subqueries, existential/universal quantifiers, aggregate and group-by queries: for example scientists may use such queries to query multiple scientific data sources, or a law enforcement agency may use it in order to find rare associations from multiple data sources. If standard query semantics were applied to such queries, all but the most trivial queries will return an empty answer.