Web taxonomy integration with hierarchical shrinkage algorithm and fine-grained relations

Authors:
Chia-Wei Wu;Richard Tzong-Han Tsai;Cheng-Wei Lee;Wen-Lian Hsu
Affiliations:
Institute of Information Science, Academia Sinica, Taipei, Taiwan;Department of Computer Science and Engineering, Yuan-Ze University, Chung-Li, Taiwan;Institute of Information Science, Academia Sinica, Taipei, Taiwan and Department of Computer Science, National Tsing-Hua University, Hsingchu, Taiwan;Institute of Information Science, Academia Sinica, Taipei, Taiwan and Department of Computer Science, National Tsing-Hua University, Hsingchu, Taiwan
Venue:
Expert Systems with Applications: An International Journal
Year:
2008

Citing 15
Cited 1

On the limited memory BFGS method for large scale optimization

Mathematical Programming: Series A and B
A maximum entropy approach to natural language processing

Computational Linguistics
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
On integrating catalogs

Proceedings of the 10th international conference on World Wide Web
Product Data Integration in B2B E-Commerce

IEEE Intelligent Systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Improving Text Classification by Shrinkage in a Hierarchy of Classes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Cross-training: learning probabilistic mappings between topics

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Statistical models for unsupervised prepositional phrase attachment

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Web taxonomy integration using support vector machines

Proceedings of the 13th international conference on World Wide Web
Web taxonomy integration through co-bootstrapping

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Integrating multiple internet directories by instance-based learning

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Category mapping for the automatic integration of category-constrained web search

International Journal of Business Intelligence and Data Mining

Quantified Score

Hi-index	12.05

Visualization

Abstract

We address the problem of integrating web taxonomies from different real Internet applications. Integrating web taxonomies is to transfer instances from a source to target taxonomy. Unlike the conventional text categorization problem, in taxonomy integration, the source taxonomy contains extra information that can be used to improve the categorization. The major existing methods can be divided in two types: those that use neighboring categories to smooth the document term vector and those that consider the semantic relationship between corresponding categories of the target and source taxonomies to facilitate categorization. In contrast to the first type of approach, which only uses a flattened hierarchy for smoothing, we apply a hierarchy shrinkage algorithm to smooth child documents by their parents. We also discuss the effect of using different hierarchical levels for smoothing. To extend the second type of approach, we extract fine-grain semantic relationships, which consider the relationships between lower-level categories. In addition, we use the cosine similarity to measure the semantic relationships, which achieves better performance than existing methods. Finally, we integrate the existing approaches and the proposed methods into one machine learning model to find the best feature configuration. The results of experiments on real Internet data demonstrate that our system outperforms standard text classifiers by about 10%.