Structure inference for linked data sources using clustering

Authors:
Klitos Christodoulou;Norman W. Paton;Alvaro A. A. Fernandes
Affiliations:
University of Manchester, Manchester, UK;University of Manchester, Manchester, UK;University of Manchester, Manchester, UK
Venue:
Proceedings of the Joint EDBT/ICDT 2013 Workshops
Year:
2013

Citing 17
Cited 1

Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluation of hierarchical clustering algorithms for document datasets

Proceedings of the eleventh international conference on Information and knowledge management
On Clustering Validation Techniques

Journal of Intelligent Information Systems
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
From databases to dataspaces: a new abstraction for information management

ACM SIGMOD Record
Foundations of RDF Databases

Reasoning Web. Semantic Technologies for Information Systems
Data summaries for on-demand queries over linked data

Proceedings of the 19th international conference on World wide web
Querying distributed RDF data sources with SPARQL

ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
Linked Data

Linked Data
Statistical schema induction

ESWC'11 Proceedings of the 8th extended semantic web conference on The semantic web: research and applications - Volume Part I
Comparing data summaries for processing live queries over Linked Data

World Wide Web
FedX: optimization techniques for federated query processing on linked data

ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
Searching and browsing Linked Data with SWSE: The Semantic Web Search Engine

Web Semantics: Science, Services and Agents on the World Wide Web
Dynamic generation of concepts hierarchies for knowledge discovering in bio-medical linked data sets

Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
Pay-as-you-go data integration for linked data: opportunities, challenges and architectures

SWIM '12 Proceedings of the 4th International Workshop on Semantic Web Information Management
Efficient distributed query processing for autonomous RDF databases

Proceedings of the 15th International Conference on Extending Database Technology
SchemEX - Efficient construction of a data catalogue by stream-based indexing of linked data

Web Semantics: Science, Services and Agents on the World Wide Web

Efficiency and precision trade-offs in graph summary algorithms

Proceedings of the 17th International Database Engineering & Applications Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Linked Data (LD) is supplementing the World Wide Web of documents with a Web of data. This is becoming apparent from the number of LD repositories available as part of the Linked Open Data (LOD) cloud. At the instance-level, LD sources use a combination of terms from various vocabularies, expressed as RDFS/OWL, to describe their data and publish them to the Web. However, LD sources do not organise their data under a specific structure analogous to a relational schema; instead data can adhere to multiple vocabularies. Expressing SPARQL queries over LD sources -- usually over a SPARQL endpoint that is presented to the user -- requires a knowledge of the predicates used, to allow queries to express user requirements as graph patterns. Although LD provides low barriers to data publication using a homogeneous language (i.e., RDF), sources organise their data with different structures and terminologies. We would like to have a synopsis of how such data are organised in LD sources to inform the expressing of queries over such sources. With this paper we make the case that structural summaries over LD sources can inform query formulation and provide support for data integration and query processing over multiple LD sources. To fulfil this aim we propose an approach, that builds on a hierarchical clustering algorithm, for inferring structural summaries over LD sources. We have conducted an experimental evaluation using various LD sources to ascertain the extent to which our technique can successfully infer structural summaries from LD sources.