Using lexical patterns in the Google Web 1T corpus to deduce semantic relations between nouns

  • Authors:
  • Paul Nulty;Fintan Costello

  • Affiliations:
  • University College Dublin, Belfield, Dublin, Ireland;University College Dublin, Belfield, Dublin, Ireland

  • Venue:
  • DEW '09 Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper investigates methods for using lexical patterns in a corpus to deduce the semantic relation that holds between two nouns in a noun-noun compound phrase such as "flu virus" or "morning exercise". Much of the previous work in this area has used automated queries to commercial web search engines. In our experiments we use the Google Web 1T corpus. This corpus contains every 2, 3, 4 and 5 gram occurring more than 40 times in Google's index of the web, but has the advantage of being available to researchers directly rather than through a web interface. This paper evaluates the performance of the Web 1T corpus on the task compared to similar systems in the literature, and also investigates what kind of lexical patterns are most informative when trying to identify a semantic relation between two nouns.