Learning to integrate web taxonomies

Authors:
Dell Zhang;Wee Sun Lee
Affiliations:
Department of Computer Science, School of Computing, S16-05-08, 3 Science Drive 2, National University of Singapore, Singapore 117543, Singapore and Singapore-MIT Alliance, E4-04-10, 4 Engineering ...;Department of Computer Science, School of Computing, SOC1-05-26, 3 Science Drive 2, National University of Singapore, Singapore 117543, Singapore and Singapore-MIT Alliance, E4-04-10, 4 Engineerin ...
Venue:
Web Semantics: Science, Services and Agents on the World Wide Web
Year:
2004

Citing 26
Cited 6

The nature of statistical learning theory

The nature of statistical learning theory
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Combining support vector and mathematical programming methods for classification

Advances in kernel methods
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
On integrating catalogs

Proceedings of the 10th international conference on World Wide Web
Ontologies: a silver bullet for knowledge management and electronic commerce

Ontologies: a silver bullet for knowledge management and electronic commerce
Learning to map between ontologies on the semantic web

Proceedings of the 11th international conference on World Wide Web
Information Retrieval

Information Retrieval
Machine Learning

Machine Learning
Modern Information Retrieval

Modern Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Improving Text Classification by Shrinkage in a Hierarchy of Classes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Facilitating the Exchange of Explicit Knowledge through Ontology Mappings

Proceedings of the Fourteenth International Florida Artificial Intelligence Research Society Conference
Text categorization by boosting automatically extracted concepts

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
An introduction to boosting and leveraging

Advanced lectures on machine learning
Cross-training: learning probabilistic mappings between topics

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Co-EM support vector learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning

Classifying search engine queries using the web as background knowledge

ACM SIGKDD Explorations Newsletter
An experimental comparative study of web mining methods for recommender systems

DIWED'06 Proceedings of the 6th WSEAS International Conference on Distance Learning and Web Engineering
GenTax: A Generic Methodology for Deriving OWL and RDF-S Ontologies from Hierarchical Classifications, Thesauri, and Inconsistent Taxonomies

ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
Semantic Web Mining

Web Semantics: Science, Services and Agents on the World Wide Web
Automatic maintenance of web directories by mining web browsing data

Journal of Web Engineering
PROBABILISTIC HEURISTICS FOR HIERARCHICAL WEB DATA CLUSTERING

Computational Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate machine learning methods for automatically integrating objects from different taxonomies into a master taxonomy. This problem is not only currently pervasive on the Web, but is also important to the emerging Semantic Web. A straightforward approach to automating this process would be to build classifiers through machine learning and then use these classifiers to classify objects from the source taxonomies into categories of the master taxonomy. However, conventional machine learning algorithms totally ignore the availability of the source taxonomies. In fact, source and master taxonomies often have common categories under different names or other more complex semantic overlaps. We introduce two techniques that exploit the semantic overlap between the source and master taxonomies to build better classifiers for the master taxonomy. The first technique, Cluster Shrinkage, biases the learning algorithm against splitting source categories by making objects in the same category appear more similar to each other. The second technique, Co-Bootstrapping, tries to facilitate the exploitation of inter-taxonomy relationships by providing category indicator functions as additional features for the objects. Our experiments with real-world Web data show that these proposed add-on techniques can enhance various machine learning algorithms to achieve substantial improvements in performance for taxonomy integration.