The TaxGen Framework: Automating the Generation of a Taxonomy for a Large Document Collection

Authors:
Adrian Miiller;Jochen Dorre
Affiliations:
-;-
Venue:
HICSS '99 Proceedings of the Thirty-Second Annual Hawaii International Conference on System Sciences-Volume 2 - Volume 2
Year:
1999

Citing 0
Cited 8

A practical web-based approach to generating topic hierarchy for text segments

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Taxonomy generation for text segments: A practical web-based approach

ACM Transactions on Information Systems (TOIS)
Automatically labeling hierarchical clusters

dg.o '06 Proceedings of the 2006 international conference on Digital government research
Labeling Nodes of Automatically Generated Taxonomy for Multi-type Relational Datasets

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Document Clustering Description Extraction and Its Application

ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
Choosing your own adventure: automatic taxonomy generation to permit many paths

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Improving hierarchical document cluster labels through candidate term selection

Intelligent Decision Technologies
Mining semantic relations between research areas

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text Mining is an active area of research and development, which combines and expands techniques found in related areas like information retrieval, computational linguistics, and data mining to perform an analysis of large corpora of digital documents. This paper describes the TaxGen Text Mining project carried out at the IBM Software Development Lab. at Boeblingen, Germany. The goal of TaxGen was the automatic generation of a taxonomy for a collection of previously unstructured documents, namely a set of 73.000 news wire documents spanning one year.