A linguistic knowledge discovery tool: very large ngram database search with arbitrary wildcards

Authors:
Satoshi Sekine
Affiliations:
New York University, New York, NY
Venue:
COLING '08 22nd International Conference on on Computational Linguistics: Demonstration Papers
Year:
2008

Citing 4
Cited 1

Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
A search engine for natural language applications

WWW '05 Proceedings of the 14th international conference on World Wide Web
Discovering relations among named entities from large corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

An n-gram frequency database reference to handle MWE extraction in NLP applications

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we will describe a search tool for a huge set of ngrams. The tool supports queries with an arbitrary number of wildcards. It takes a fraction of a second for a search, and can provide the fillers of the wildcards. The system runs on a single Linux PC with reasonable size memory (less than 4GB) and disk space (less than 400GB). This system can be a very useful tool for linguistic knowledge discovery and other NLP tasks.