Comparing Window and Syntax Based Strategies for Semantic Extraction

  • Authors:
  • Pablo Gamallo Otero

  • Affiliations:
  • Departamento de Língua Espanhola, Faculdade de Filologia, Universidade de Santiago de Compostela, Galiza, Spain

  • Venue:
  • PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we describe and compare two different approaches for extracting similar words from large corpora. In particular, we compared a method based on syntactic contexts with two strategies relying on windows of tagged words, one using word order and the other bags of words. On a Portuguese corpus of 12 million words, syntactic contexts produce significantly better results for both frequent and not very frequent words.