Unsupervised generation of data mining features from linked open data

  • Authors:
  • Heiko Paulheim;Johannes Fümkranz

  • Affiliations:
  • Technische Universität Darmstadt, Darmstadt, Germany;Technische Universität Darmstadt, Darmstadt, Germany

  • Venue:
  • Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The quality of the results of a data mining process strongly depends on the quality of the data it processes. A good result is more likely to obtain the more useful background knowledge there is in a dataset. In this paper, we present a fully automatic approach for enriching data with features that are derived from Linked Open Data, a very large, openly available data collection. We identify six different types of feature generators, which are implemented in our open-source tool FeGeLOD. In four case studies, we show that our approach can be applied to different problems, ranging from classical data mining to ontology learning and ontology matching on the semantic web. The results show that features generated from publicly available information may allow data mining in problems where features are not available at all, as well as help improving the results for tasks where some features are already available.