A graph distance metric based on the maximal common subgraph
Pattern Recognition Letters
Efficient mining of emerging patterns: discovering trends and differences
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
An Algorithm for Subgraph Isomorphism
Journal of the ACM (JACM)
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
CAEP: Classification by Aggregating Emerging Patterns
DS '99 Proceedings of the Second International Conference on Discovery Science
Mining Molecular Fragments: Finding Relevant Substructures of Molecules
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Classification of Web Documents Using a Graph Model
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Graph-theoretic techniques for web content mining
Graph-theoretic techniques for web content mining
Classifying Chemical Compounds Using Contrast and Common Patterns
ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part I
Transactions on rough sets XII
Hi-index | 0.00 |
The problem of classifying web documents is studied in this paper. A graph-based instead of traditional vector-based model is used for document representation. A novel classification algorithm which uses two different types of structural patterns (subgraphs): contrast and common is proposed. This approach is strongly associated with the classical emerging patterns techniques known from decision tables. The presented method is evaluated on three different benchmark web documents collections for measuring classification accuracy. Results show that it can outperform other existing algorithms (based on vector, graph, and hybrid document representation) in terms of accuracy and document model complexity. Another advantage is that the introduced classifier has a simple, understandable structure and can be easily extended by the expert knowledge.