Proceedings of the 10th international conference on World Wide Web
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Cross-training: learning probabilistic mappings between topics
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Web taxonomy integration using support vector machines
Proceedings of the 13th international conference on World Wide Web
Web taxonomy integration through co-bootstrapping
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
The purpose of integrating web directories is to transfer instances from a source to a target directory. Unlike con-ventional text categorization, in directory integration, there is extra information about the source directory that can be used to improve the classification accuracy. Many approaches exploit the measured similarity between two corresponding classes to enhance traditional text classifi-ers. These methods perform well if the topics of two classes are very similar, but they could lead to misclassifi-cation if the topics are dissimilar. We propose a directory integration approach based on the conditional random fields (CRFs) model, and model the integration process using a finite-state model. The advantage of using CRFs is that the transition features naturally include information about the relations between classes. Our results show that CRFs outperform conven-tional text classifiers. In addition, CRFs allow us to apply complex features to integrate the information about the contents of class and their labels. The performance of our approach can be improved by applying these features, especially for instances whose source and target classes are moderately similar.