Knowledge capture from multiple online sources with the extensible web retrieval toolkit (eWRT)

Authors:
Albert Weichselbraun;Arno Scharl;Heinz-Peter Lang
Affiliations:
University of Applied Sciences Chur, Chur, Switzerland;University Vienna, Vienna, Austria;Vienna University of Economics & Business, Vienna, Austria
Venue:
Proceedings of the seventh international conference on Knowledge capture
Year:
2013

Citing 8
Cited 0

The Art of Computer Programming, Volume 4, Fascicle 1: Bitwise Tricks & Techniques; Binary Decision Diagrams

The Art of Computer Programming, Volume 4, Fascicle 1: Bitwise Tricks & Techniques; Binary Decision Diagrams
Locality sensitive hashing: A comparison of hash function types and querying mechanisms

Pattern Recognition Letters
Refining non-taxonomic relation labels with external structured data to support ontology learning

Data & Knowledge Engineering
Applying Optimal Stopping Theory to Improve the Performance of Ontology Refinement Methods

HICSS '11 Proceedings of the 2011 44th Hawaii International Conference on System Sciences
Optimizing queries to remote resources

Journal of Intelligent Information Systems
DBpedia spotlight: shedding light on the web of documents

Proceedings of the 7th International Conference on Semantic Systems
Sentimantics: conceptual spaces for lexical sentiment polarity representation with contextuality

WASSA '12 Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis
Media Watch on Climate Change -- Visual Analytics for Aggregating and Managing Environmental Knowledge from Online Sources

HICSS '13 Proceedings of the 2013 46th Hawaii International Conference on System Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

Knowledge capture approaches in the age of massive Web data require robust and scalable mechanisms to acquire, consolidate and pre-process large amounts of heterogeneous data, both unstructured and structured. This paper addresses this requirement by introducing the Extensible Web Retrieval Toolkit (eWRT), a modular Python API for retrieving social data from Web sources such as Delicious, Flickr, Yahoo! and Wikipedia. eWRT has been released as an open source library under GNU GPLv3. It includes classes for caching and data management, and provides low-level text processing capabilities including language detection, phonetic string similarity measures, and string normalization.