Graph-based text representation and knowledge discovery

Authors:
Wei Jin;Rohini K. Srihari
Affiliations:
State University of New York at Buffalo, NY;State University of New York at Buffalo, NY
Venue:
Proceedings of the 2007 ACM symposium on Applied computing
Year:
2007

Citing 6
Cited 2

Models for retrieval with probabilistic indexing

Information Processing and Management: an International Journal - Modeling data, information and knowledge
Modern Information Retrieval

Modern Information Retrieval
Classification of Web Documents Using a Graph Model

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Text mining: generating hypotheses from MEDLINE

Journal of the American Society for Information Science and Technology
Construction of conceptual graph representation of texts

HLT-SRWS '04 Proceedings of the Student Research Workshop at HLT-NAACL 2004
Multi-document summarization by graph search and matching

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence

Mining concept associations for knowledge discovery through concept chain queries

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Cross-lingual document representation and semantic similarity measure: a fuzzy set and rough set based approach

IEEE Transactions on Fuzzy Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

For information retrieval and text-mining, a robust scalable framework is required to represent the information extracted from documents and enable visualization and query of such information. One very widely used model is the vector space model which is based on the bag-of-words approach. However, it suffers from the fact that it loses important information about the original text, such as information about the order of the terms in the text or about the frontiers between sentences or paragraphs. In this paper, we propose a graph-based text representation, which is capable of capturing (i) Term order (ii) Term frequency (iii) Term co-occurrence (iv) Term context in documents. We also apply the graph model into our text mining task, which is to discover unapparent associations between two and more concepts (e.g. individuals) from a large text corpus. Counterterrorism corpus is used to evaluate the performance of various retrieval models, which demonstrates feasibility and effectiveness of graphic text representation in information retrieval and text mining.