Invited paper: Scalable reduction of large datasets to interesting subsets

Authors:
Gregory Todd Williams;Jesse Weaver;Medha Atre;James A. Hendler
Affiliations:
-;-;-;-
Venue:
Web Semantics: Science, Services and Agents on the World Wide Web
Year:
2010

Citing 27
Cited 2

Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers (2nd Edition)

Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers (2nd Edition)
RDFPeers: a scalable distributed RDF repository based on a structured peer-to-peer network

Proceedings of the 13th international conference on World Wide Web
Multiprocessor hash-based join algorithms

VLDB '85 Proceedings of the 11th international conference on Very Large Data Bases - Volume 11
GridVine: An Infrastructure for Peer Information Management

IEEE Internet Computing
Scalable semantic web data management using vertical partitioning

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Parallel Inferencing for OWL Knowledge Bases

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
RDF-3X: a RISC-style engine for RDF

Proceedings of the VLDB Endowment
Hexastore: sextuple indexing for semantic web data management

Proceedings of the VLDB Endowment
Column-store support for RDF data management: not all swans are white

Proceedings of the VLDB Endowment
What Is Approximate Reasoning?

RR '08 Proceedings of the 2nd International Conference on Web Reasoning and Rule Systems
Anytime Query Answering in RDF through Evolutionary Algorithms

ISWC '08 Proceedings of the 7th International Conference on The Semantic Web
RDFS Reasoning and Query Answering on Top of DHTs

ISWC '08 Proceedings of the 7th International Conference on The Semantic Web
Parallel Computation Techniques for Ontology Reasoning

ISWC '08 Proceedings of the 7th International Conference on The Semantic Web
SAOR: Authoritative Reasoning for the Web

ASWC '08 Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web
Scalable Distributed Ontology Reasoning Using DHT-Based Partitioning

ASWC '08 Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web
Scalable join processing on very large RDF graphs

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Marvin: Distributed reasoning over large-scale Semantic Web data

Web Semantics: Science, Services and Agents on the World Wide Web
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

Proceedings of the VLDB Endowment
Scalable Distributed Reasoning Using MapReduce

ISWC '09 Proceedings of the 8th International Semantic Web Conference
Parallel Materialization of the Finite RDFS Closure for Hundreds of Millions of Triples

ISWC '09 Proceedings of the 8th International Semantic Web Conference
LUBM: A benchmark for OWL knowledge base systems

Web Semantics: Science, Services and Agents on the World Wide Web
Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data

Proceedings of the 19th international conference on World wide web
Parallelizing tableaux-based description logic reasoning

OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems - Volume Part II
YARS2: a federated repository for querying graph structured data from the web

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Continuous RDF query processing over DHTs

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Combining RDF and part of OWL with rules: semantics, decidability, complexity

ISWC'05 Proceedings of the 4th international conference on The Semantic Web
OWL reasoning with WebPIE: calculating the closure of 100 billion triples

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part I

SameAs networks and beyond: analyzing deployment status and implications of owl:sameAs in linked data

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
Efficient RDFS entailment in external memory

OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

With a huge amount of RDF data available on the web, the ability to find and access relevant information is crucial. Traditional approaches to storing, querying, and reasoning fall short when faced with web-scale data. We present a system that combines the computational power of large clusters for enabling large-scale reasoning and data access with an efficient data structure for storing and querying the accessed data on a traditional personal computer or other resource-constrained device. We present results of using this system to load the 2009 Billion Triples Challenge dataset, materialize RDFS inferences, extract an ''interesting'' subset of the data using a large cluster, and further analyze the extracted data using a personal computer, all in the order of tens of minutes.