Automatic Category Theme Identification and Hierarchy Generation for Chinese Text Categorization

Authors:
Hsin-Chang Yang;Chung-Hong Lee
Affiliations:
Department of Information Management, Chang Jung University, Tainan, Taiwan;Department of Electrical Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan
Venue:
Journal of Intelligent Information Systems
Year:
2005

Citing 23
Cited 1

Subtopic structuring for full-length document access

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Chinese text segmentation for text retrieval: achievements and problems

Journal of the American Society for Information Science
Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
ACTS: an automatic Chinese text segmentation system for full text retrieval

Journal of the American Society for Information Science
On Chinese text retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Combining classifiers in text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Chinese text retrieval without using a dictionary

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Cat-a-Cone: an interactive interface for specifying searches and viewing retrieval results using a large category hierarchy

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Mining Text Using Keyword Distributions

Journal of Intelligent Information Systems
A new statistical formula for Chinese text segmentation incorporating contextual information

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A Web text mining approach based on self-organizing map

Proceedings of the 2nd international workshop on Web information and data management
Self-Organizing Maps

Self-Organizing Maps
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Exploiting Hierarchy in Text Categorization

Information Retrieval
Automatic Text Categorization and Its Application to Text Retrieval

IEEE Transactions on Knowledge and Data Engineering
TopCat: Data Mining for Topic Identification in a Text Corpus

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
The Cluster-Abstraction Model: Unsupervised Learning of Topic Hierarchies from Text Data

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Text Segmentation by Topic

ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Automatic Text Theme Generation and the Analysis of Text Structure

Automatic Text Theme Generation and the Analysis of Text Structure
Knowledge-based automatic topic identification

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Feature selection and feature extraction for text categorization

HLT '91 Proceedings of the workshop on Speech and Natural Language

Construction of supervised and unsupervised learning systems for multilingual text categorization

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently research on text mining has attracted lots of attention from both industrial and academic fields. Text mining concerns of discovering unknown patterns or knowledge from a large text repository. The problem is not easy to tackle due to the semi-structured or even unstructured nature of those texts under consideration. Many approaches have been devised for mining various kinds of knowledge from texts. One important aspect of text mining is on automatic text categorization, which assigns a text document to some predefined category if the document falls into the theme of the category. Traditionally the categories are arranged in hierarchical manner to achieve effective searching and indexing as well as easy comprehension for human beings. The determination of category themes and their hierarchical structures were most done by human experts. In this work, we developed an approach to automatically generate category themes and reveal the hierarchical structure among them. We also used the generated structure to categorize text documents. The document collection was trained by a self-organizing map to form two feature maps. These maps were then analyzed to obtain the category themes and their structure. Although the test corpus contains documents written in Chinese, the proposed approach can be applied to documents written in any language and such documents can be transformed into a list of separated terms.