Exploring the data-driven prediction of prepositions in English

  • Authors:
  • Anas Elghafari;Detmar Meurers;Holger Wunsch

  • Affiliations:
  • Universität Tübingen;Universität Tübingen;Universität Tübingen

  • Venue:
  • COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Prepositions in English are a well-known challenge for language learners, and the computational analysis of preposition usage has attracted significant attention. Such research generally starts out by developing models of preposition usage for native English based on a range of features, from shallow surface evidence to deep linguistically-informed properties. While we agree that ultimately a combination of shallow and deep features is needed to balance the preciseness of exemplars with the usefulness of generalizations to avoid data sparsity, in this paper we explore the limits of a purely surface-based prediction of prepositions. Using a web-as-corpus approach, we investigate the classification based solely on the relative number of occurrences for target n-grams varying in preposition usage. We show that such a surface-based approach is competitive with the published state-of-the-art results relying on complex feature sets. Where enough data is available, in a surprising number of cases it thus is possible to obtain sufficient information from the relatively narrow window of context provided by n-grams which are small enough to frequently occur but large enough to contain enough predictive information about preposition usage.