Extracting statistical data frames from text

Authors:
Jisheng Liang;Krzysztof Koperski;Thien Nguyen;Giovanni Marchisio
Affiliations:
Insightful Corporation, Seattle, WA;Insightful Corporation, Seattle, WA;Insightful Corporation, Seattle, WA;Insightful Corporation, Seattle, WA
Venue:
ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Year:
2005

Citing 6
Cited 4

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Efficient Subgraph Isomorphism Detection: A Decomposition Approach

IEEE Transactions on Knowledge and Data Engineering
Text analysis and knowledge mining system

IBM Systems Journal
Untangling text data mining

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
A text-mining system for knowledge discovery from biomedical documents

IBM Systems Journal
Report on KDD conference 2004 panel discussion can natural language processing help text mining?

ACM SIGKDD Explorations Newsletter

Ontology-based natural language query processing for the biological domain

BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Pattern Mining with Natural Language Processing: An Exploratory Approach

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Ontology-based natural language query processing for the biological domain

LNLBioNLP '06 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology
A large-scale system for annotating and querying quotations in news feeds

Proceedings of the 3rd International Semantic Search Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a framework that bridges the gap between natural language processing (NLP) and text mining. Central to this is a new approach to text parameterization that captures many interesting attributes of text usually ignored by standard indices, like the term-document matrix. By storing NLP tags, the new index supports a higher degree of knowledge discovery and pattern finding from text. The index is relatively compact, enabling dynamic search of arbitrary relationships and events in large document collections. We can export search results in formats and data structures that are transparent to statistical analysis tools like S-PLUSID®. In a number of experiments, we demonstrate how this framework can turn mountains of unstructured information into informative statistical graphs.