Matching semi-structured documents using similarity of regions through fuzzy rule-based system

  • Authors:
  • Alireza Ensan;Yevgen Biletskiy

  • Affiliations:
  • University of New Brunswick, Fredericton, New Brunswick, Canada;University of New Brunswick, Fredericton, New Brunswick, Canada

  • Venue:
  • ICDM'13 Proceedings of the 13th international conference on Advances in Data Mining: applications and theoretical aspects
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The present work briefly describes a novel approach for categorizing semi-structure documents by using fuzzy rule-based system. We propose fuzzy logic representation for semi-structured documents and then by proposing new metric, categorize documents into different classes. The idea behind of our approach is to divide web pages into different semantic sections and by using fuzzy logic system extract features and weight harvested terms to represent semi-structure documents. A set of metrics are also used to measure similarity between documents based on the weight of each region in the text. A clustering algorithm is also explained that categorized documents into several categories. This idea is inspired as a subfield of the area of Matchmaking that tries to match document creators and users in order to find the best similarities between them and connect them for further collaborations.