Visual analysis of documents with semantic graphs

Authors:
Delia Rusu;Blaž Fortuna;Dunja Mladenić;Marko Grobelnik;Ruben Sipoš
Affiliations:
Jožef Stefan Institute, Ljubljana, Slovenia;Jožef Stefan Institute, Ljubljana, Slovenia;Jožef Stefan Institute, Ljubljana, Slovenia;Jožef Stefan Institute, Ljubljana, Slovenia;Jožef Stefan Institute, Ljubljana, Slovenia
Venue:
Proceedings of the ACM SIGKDD Workshop on Visual Analytics and Knowledge Discovery: Integrating Automated Analysis with Interactive Exploration
Year:
2009

Citing 6
Cited 3

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Less is more: eliminating index terms from subordinate clauses

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Challenges in Visual Data Analysis

IV '06 Proceedings of the conference on Information Visualization
Web Mining for Understanding Stories through Graph Visualisation

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
IVEA: an information visualization tool for personalized exploratory document collection analysis

ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications

Visualizations for the spyglass ontology-based information analysis and retrieval system

Proceedings of the 48th Annual Southeast Regional Conference
An Interests Discovery Approach in Social Networks Based on Semantically Enriched Graphs

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Story graphs: Tracking document set evolution using dynamic graphs

Intelligent Data Analysis - Dynamic Networks and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a technique for visual analysis of documents based on the semantic representation of text in the form of a directed graph, referred to as semantic graph. This approach can aid data mining tasks, such as exploratory data analysis, data description and summarization. In order to derive the semantic graph, we take advantage of natural language processing, and carry out a series of operations comprising a pipeline, as follows. Firstly, named entities are identified and co-reference resolution is performed; moreover, pronominal anaphors are resolved for a subset of pronouns. Secondly, subject -- predicate -- object triplets are automatically extracted from the Penn Treebank parse tree obtained for each sentence in the document. The triplets are further enhanced by linking them to their corresponding co-referenced named entity, as well as attaching the associated WordNet synset, where available. Thus we obtain a semantic directed graph composed of connected triplets. The document's semantic graph is a starting point for automatically generating the document summary. The model for summary generation is obtained by machine learning, where the features are extracted from the semantic graph structure and content. The summary also has an associated semantic representation. The size of the semantic graph, as well as the summary length can be manually adjusted for an enhanced visual analysis. We also show how to employ the proposed technique for the Visual Analytics challenge.