SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Classifying news stories using memory based reasoning
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Document classification by machine: theory and practice
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
An automatic extraction of key paragraphs based on context dependency
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Hi-index | 0.00 |
In this paper we describe a method of classifying Japanese text documents using domain specific kanji characters. Text documents are generally classified by significant words (keywords) of the documents. However, it is difficult to extract these significant words from Japanese text, because Japanese texts are written without using blank spaces, such as delimiters, and must be segmented into words. Therefore, instead of words, we used domain specific kanji characters which appear more frequently in one domain than the other. We extracted these domain specific kanji characters by X2 method. Then, using these domain specific kanji characters, we classified editorial columns "TENSEI JINGO", editorial articles, and articles in "Scientific American (in Japanese)". The correct recognition scores for them were 47%, 74%, and 85%, respectively.