Exploring phrase-based classification of judicial documents for criminal charges in chinese

  • Authors:
  • Chao-Lin Liu;Chwen-Dar Hsieh

  • Affiliations:
  • Department of Computer Science, National Chengchi University, Taiwan;Department of Computer Science, National Chengchi University, Taiwan

  • Venue:
  • ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.02

Visualization

Abstract

Phrases provide a better foundation for indexing and retrieving documents than individual words. Constituents of phrases make other component words in the phrase less ambiguous than when the words appear separately. Intuitively, classifiers that employ phrases for indexing should perform better than those that use words. Although pioneers have explored the possibility of indexing English documents decades ago, there are relatively fewer similar attempts for Chinese documents, partially because segmenting Chinese text into words correctly is not easy already. We build a domain dependent word list with the help of Chien's PAT tree-based method and HowNet, and use the resulting word list for defining relevant phrases for classifying Chinese judicial documents. Experimental results indicate that using phrases for indexing indeed allows us to classify judicial documents that are closely similar to each other. With a relatively more efficient algorithm, our classifier offers better performances than those reported in related works.