The ngram statistics package (Text::NSP): a flexible tool for identifying ngrams, collocations, and word associations

Authors:
Ted Pedersen;Satanjeev Banerjee;Bridget T. McInnes;Saiyam Kohli;Mahesh Joshi;Ying Liu
Affiliations:
University of Minnesota, Duluth, MN;Twitter, Inc., San Francisco, CA;University of Minnesota, Minneapolis, MN;SDL Language Weaver, Inc., Los Angeles, CA;Carnegie Mellon University Pittsburgh, PA;University of Minnesota, Minneapolis, MN
Venue:
MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Year:
2011

Citing 6
Cited 2

Word association norms, mutual information, and lexicography

Computational Linguistics
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
Empirical estimates of adaptation: the chance of two noriegas is closer to p/2 than p2

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
The design, implementation, and use of the Ngram statistics package

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Significant lexical relationships

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Corpus-Driven hyponym acquisition for turkish language

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
A broad evaluation of techniques for automatic acquisition of multiword expressions

ACL '12 Proceedings of ACL 2012 Student Research Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Ngram Statistics Package (Text::NSP) is freely available open-source software that identifies ngrams, collocations and word associations in text. It is implemented in Perl and takes advantage of regular expressions to provide very flexible tokenization and to allow for the identification of non-adjacent ngrams. It includes a wide range of measures of association that can be used to identify collocations.