Extracting statistical data frames from text

  • Authors:
  • Jisheng Liang;Krzysztof Koperski;Thien Nguyen;Giovanni Marchisio

  • Affiliations:
  • Insightful Corporation, Seattle, WA;Insightful Corporation, Seattle, WA;Insightful Corporation, Seattle, WA;Insightful Corporation, Seattle, WA

  • Venue:
  • ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a framework that bridges the gap between natural language processing (NLP) and text mining. Central to this is a new approach to text parameterization that captures many interesting attributes of text usually ignored by standard indices, like the term-document matrix. By storing NLP tags, the new index supports a higher degree of knowledge discovery and pattern finding from text. The index is relatively compact, enabling dynamic search of arbitrary relationships and events in large document collections. We can export search results in formats and data structures that are transparent to statistical analysis tools like S-PLUSID®. In a number of experiments, we demonstrate how this framework can turn mountains of unstructured information into informative statistical graphs.