Modeling Common Real-Word Relations Using Triples Extracted from n-Grams

  • Authors:
  • Ruben Sipoš;Dunja Mladenić;Marko Grobelnik;Janez Brank

  • Affiliations:
  • Jozef Stefan Institute, Ljubljana, Slovenia 1000;Jozef Stefan Institute, Ljubljana, Slovenia 1000;Jozef Stefan Institute, Ljubljana, Slovenia 1000;Jozef Stefan Institute, Ljubljana, Slovenia 1000

  • Venue:
  • ASWC '09 Proceedings of the 4th Asian Conference on The Semantic Web
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present an approach providing generalized relations for automatic ontology building based on frequent word n-grams. Using publicly available Google n-grams as our data source we can extract relations in form of triples and compute generalized and more abstract models. We propose an algorithm for building abstractions of the extracted triples using WordNet as background knowledge. We also present a novel approach to triple extraction using heuristics, which achieves notably better results than deep parsing applied on n-grams. This allows us to represent information gathered from the web as a set of triples modeling the common and frequent relations expressed in natural language. Our results have potential for usage in different settings including providing for a knowledge base for reasoning or simply as statistical data useful in improving understanding of natural languages.