Labeling Nodes of Automatically Generated Taxonomy for Multi-type Relational Datasets

Authors:
Tao Li;Sarabjot S. Anand
Affiliations:
Department of Computer Science, University of Warwick, Coventry, United Kingdom;Department of Computer Science, University of Warwick, Coventry, United Kingdom
Venue:
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Year:
2008

Citing 13
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Finding topic words for hierarchical summarization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning

Machine Learning
Modern Information Retrieval

Modern Information Retrieval
Inducing classification and regression trees in first order logic

Relational Data Mining
Evaluation of hierarchical clustering algorithms for document datasets

Proceedings of the eleventh international conference on Information and knowledge management
Induction of Decision Trees

Machine Learning
Towards Automatic Generation of Query Taxonomy: A Hierarchical Query Clustering Approach

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Generating hierarchical summaries for web searches

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The TaxGen Framework: Automating the Generation of a Taxonomy for a Large Document Collection

HICSS '99 Proceedings of the Thirty-Second Annual Hawaii International Conference on System Sciences-Volume 2 - Volume 2
Auto-generation of topic hierarchies for web images from users' perspectives

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Multi-relational data mining: an introduction

ACM SIGKDD Explorations Newsletter
A hierarchical monothetic document clustering algorithm for summarization and browsing search results

Proceedings of the 13th international conference on World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic Taxonomy Generation organizes a large dataset into a hierarchical structure so as to facilitate people's navigation and browsing actions. To better summarize the content of each node as well as to reflect the distinctiveness between sibling ones, meaningful labels need to be assigned to all the nodes within a derived taxonomy. Current research only focuses on labeling taxonomies that are built from a corpora of textual documents. In this paper we address the problem of labeling taxonomies built for multi-type relational datasets. A novel measure is proposed to quantitatively evaluate the homogeneity of each node and the heterogeneity of its sibling nodes using information-theoretical techniques, based on which the labels of taxonomic nodes are determined. We perform some experiments on a real dataset to prove the effectiveness of our method.