Exploring phrase-based classification of judicial documents for criminal charges in chinese

Authors:
Chao-Lin Liu;Chwen-Dar Hsieh
Affiliations:
Department of Computer Science, National Chengchi University, Taiwan;Department of Computer Science, National Chengchi University, Taiwan
Venue:
ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
Year:
2006

Citing 9
Cited 0

Automatic phrase indexing for document retrieval

SIGIR '87 Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
The use of phrases and structured queries in information retrieval

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Fast and quasi-natural language search for gigabytes of Chinese texts

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Comparing representations in Chinese information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
PAT-tree-based keyword extraction for Chinese information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Automatic categorization of case law

Proceedings of the 8th international conference on Artificial intelligence and law
Machine Learning

Machine Learning
Classifying criminal charges in chinese for web-based legal services

APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development

Quantified Score

Hi-index	0.02

Visualization

Abstract

Phrases provide a better foundation for indexing and retrieving documents than individual words. Constituents of phrases make other component words in the phrase less ambiguous than when the words appear separately. Intuitively, classifiers that employ phrases for indexing should perform better than those that use words. Although pioneers have explored the possibility of indexing English documents decades ago, there are relatively fewer similar attempts for Chinese documents, partially because segmenting Chinese text into words correctly is not easy already. We build a domain dependent word list with the help of Chien's PAT tree-based method and HowNet, and use the resulting word list for defining relevant phrases for classifying Chinese judicial documents. Experimental results indicate that using phrases for indexing indeed allows us to classify judicial documents that are closely similar to each other. With a relatively more efficient algorithm, our classifier offers better performances than those reported in related works.