The ngram statistics package (Text::NSP): a flexible tool for identifying ngrams, collocations, and word associations

  • Authors:
  • Ted Pedersen;Satanjeev Banerjee;Bridget T. McInnes;Saiyam Kohli;Mahesh Joshi;Ying Liu

  • Affiliations:
  • University of Minnesota, Duluth, MN;Twitter, Inc., San Francisco, CA;University of Minnesota, Minneapolis, MN;SDL Language Weaver, Inc., Los Angeles, CA;Carnegie Mellon University Pittsburgh, PA;University of Minnesota, Minneapolis, MN

  • Venue:
  • MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Ngram Statistics Package (Text::NSP) is freely available open-source software that identifies ngrams, collocations and word associations in text. It is implemented in Perl and takes advantage of regular expressions to provide very flexible tokenization and to allow for the identification of non-adjacent ngrams. It includes a wide range of measures of association that can be used to identify collocations.