Statistical schema matching across web query interfaces
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Pay-as-you-go user feedback for dataspace systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Matching Schemas in Online Communities: A Web 2.0 Approach
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Human-assisted graph search: it's okay to ask questions
Proceedings of the VLDB Endowment
CrowdDB: answering queries with crowdsourcing
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Pay-as-you-go mapping selection in dataspaces
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Open data platforms such as data.gov or opendata.socrata. com provide a huge amount of valuable information, publicly available to anyone. This data has the potential to drive innovation and lead to a more democratic and transparent society. Still, the platforms it is offered on have some unique problems: Their free-for-all nature, the lack of publishing standards and the multitude of domains and authors represented on these platforms lead to new integration and standardization problems, such as duplicated or partitioned datasets. At the same time, crowd-based data integration techniques are emerging as new way of dealing with data integration problems. However, these methods still require input in form of specific questions or tasks that can be passed to the crowd. This paper identifies several classes of integration problems on Open Data Platforms, and proposes a method for identifying and ranking potential them in this context. In this method, an Open Data Platform is modeled as a graph of datasets, so that potentital integration problems, called integration hypotheses, can be identified by analyzing the graph for specific patterns. The paper concludes with a comprehensive evaluation using one of the largest Open Data platforms, opendata.socrata.com.