Web Directory Integration Using Conditional Random Fields

  • Authors:
  • Terry Chia-Wei Wu;Wen-Lian Hsu

  • Affiliations:
  • Academia Sinica, Taiwan;Academia Sinica, Taiwan/ National Tsing-Hua University, Taiwan

  • Venue:
  • WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The purpose of integrating web directories is to transfer instances from a source to a target directory. Unlike con-ventional text categorization, in directory integration, there is extra information about the source directory that can be used to improve the classification accuracy. Many approaches exploit the measured similarity between two corresponding classes to enhance traditional text classifi-ers. These methods perform well if the topics of two classes are very similar, but they could lead to misclassifi-cation if the topics are dissimilar. We propose a directory integration approach based on the conditional random fields (CRFs) model, and model the integration process using a finite-state model. The advantage of using CRFs is that the transition features naturally include information about the relations between classes. Our results show that CRFs outperform conven-tional text classifiers. In addition, CRFs allow us to apply complex features to integrate the information about the contents of class and their labels. The performance of our approach can be improved by applying these features, especially for instances whose source and target classes are moderately similar.