A Topic Model of Observing Chinese Characters

Authors:
Yunkai Zhang;Zengchang Qin
Affiliations:
-;-
Venue:
IHMSC '10 Proceedings of the 2010 Second International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02
Year:
2010

Citing 0
Cited 1

What is the basic semantic unit of Chinese language? a computational approach based on topic models

MOL'11 Proceedings of the 12th biennial conference on The mathematics of language

Quantified Score

Hi-index	0.02

Visualization

Abstract

The Topic Models are a class of hierarchical statistical models for analyzing document collections and it has become one of the most used techniques in Natural Language Processing in the recent years. It assumes that each document could be expressed as a mixture of topics and each topic could be characterized by a distribution over words. In previous research [6], like in English language, Topic Models for Chinese Language use the words as observing data. In this research, we demonstrated the effectiveness of using Chinese characters as the basic units of observing data. The comparisons with those models based on Chinese words and English words are presented.