Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Less is more: eliminating index terms from subordinate clauses
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Challenges in Visual Data Analysis
IV '06 Proceedings of the conference on Information Visualization
Web Mining for Understanding Stories through Graph Visualisation
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
IVEA: an information visualization tool for personalized exploratory document collection analysis
ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
Visualizations for the spyglass ontology-based information analysis and retrieval system
Proceedings of the 48th Annual Southeast Regional Conference
An Interests Discovery Approach in Social Networks Based on Semantically Enriched Graphs
ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Story graphs: Tracking document set evolution using dynamic graphs
Intelligent Data Analysis - Dynamic Networks and Knowledge Discovery
Hi-index | 0.00 |
In this paper, we present a technique for visual analysis of documents based on the semantic representation of text in the form of a directed graph, referred to as semantic graph. This approach can aid data mining tasks, such as exploratory data analysis, data description and summarization. In order to derive the semantic graph, we take advantage of natural language processing, and carry out a series of operations comprising a pipeline, as follows. Firstly, named entities are identified and co-reference resolution is performed; moreover, pronominal anaphors are resolved for a subset of pronouns. Secondly, subject -- predicate -- object triplets are automatically extracted from the Penn Treebank parse tree obtained for each sentence in the document. The triplets are further enhanced by linking them to their corresponding co-referenced named entity, as well as attaching the associated WordNet synset, where available. Thus we obtain a semantic directed graph composed of connected triplets. The document's semantic graph is a starting point for automatically generating the document summary. The model for summary generation is obtained by machine learning, where the features are extracted from the semantic graph structure and content. The summary also has an associated semantic representation. The size of the semantic graph, as well as the summary length can be manually adjusted for an enhanced visual analysis. We also show how to employ the proposed technique for the Visual Analytics challenge.