A linguistic knowledge discovery tool: very large ngram database search with arbitrary wildcards

  • Authors:
  • Satoshi Sekine

  • Affiliations:
  • New York University, New York, NY

  • Venue:
  • COLING '08 22nd International Conference on on Computational Linguistics: Demonstration Papers
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we will describe a search tool for a huge set of ngrams. The tool supports queries with an arbitrary number of wildcards. It takes a fraction of a second for a search, and can provide the fillers of the wildcards. The system runs on a single Linux PC with reasonable size memory (less than 4GB) and disk space (less than 400GB). This system can be a very useful tool for linguistic knowledge discovery and other NLP tasks.