Text classification using graph mining-based feature extraction

Authors:
Chuntao Jiang;Frans Coenen;Robert Sanderson;Michele Zito
Affiliations:
The University of Liverpool, Department of Computer Science, Ashton Building, Ashton Street, Liverpool, L69 3BX, United Kingdom;The University of Liverpool, Department of Computer Science, Ashton Building, Ashton Street, Liverpool, L69 3BX, United Kingdom;The University of Liverpool, Department of Computer Science, Ashton Building, Ashton Street, Liverpool, L69 3BX, United Kingdom;The University of Liverpool, Department of Computer Science, Ashton Building, Ashton Street, Liverpool, L69 3BX, United Kingdom
Venue:
Knowledge-Based Systems
Year:
2010

Citing 16
Cited 12

Efficient mining of weighted association rules (WAR)

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Mining Association Rules with Weighted Items

IDEAS '98 Proceedings of the 1998 International Symposium on Database Engineering & Applications
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Weighted Association Rule Mining using weighted support and significance framework

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Obtaining Best Parameter Values for Accurate Classification

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Graph-theoretic techniques for web content mining

Graph-theoretic techniques for web content mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Bidirectional inference with the easiest-first strategy for tagging sequence data

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Frequent Subtree Mining - An Overview

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Structure-sensitive learning of text types

AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
Fast categorization of web documents represented by graphs

WebKDD'06 Proceedings of the 8th Knowledge discovery on the web international conference on Advances in web mining and web usage analysis
Term graph model for text classification

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications

Arabic script web page language identifications using decision tree neural networks

Pattern Recognition
Development and application of a keyword-based knowledge map for effective R&D planning

Scientometrics
Frequent sub-graph mining on edge weighted graphs

DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
Multimedia data mining: state of the art and challenges

Multimedia Tools and Applications
Community detection based on a semantic network

Knowledge-Based Systems
Semantic search in the World News domain using automatically extracted metadata files

Knowledge-Based Systems
Frequent approximate subgraphs as features for graph-based image classification

Knowledge-Based Systems
Vector space model for patent documents with hierarchical class labels

Journal of Information Science
Secure collaboration in global design and supply chain environment: Problem analysis and literature review

Computers in Industry
A new proposal for graph-based image classification using frequent approximate subgraphs

Pattern Recognition
A new proposal for graph classification using frequent geometric subgraphs

Data & Knowledge Engineering
CoBAn: A context based model for data leakage prevention

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

A graph-based approach to document classification is described in this paper. The graph representation offers the advantage that it allows for a much more expressive document encoding than the more standard bag of words/phrases approach, and consequently gives an improved classification accuracy. Document sets are represented as graph sets to which a weighted graph mining algorithm is applied to extract frequent subgraphs, which are then further processed to produce feature vectors (one per document) for classification. Weighted subgraph mining is used to ensure classification effectiveness and computational efficiency; only the most significant subgraphs are extracted. The approach is validated and evaluated using several popular classification algorithms together with a real world textual data set. The results demonstrate that the approach can outperform existing text classification algorithms on some dataset. When the size of dataset increased, further processing on extracted frequent features is essential.