Automatic phrase indexing for document retrieval
SIGIR '87 Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
The use of phrases and structured queries in information retrieval
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Fast and quasi-natural language search for gigabytes of Chinese texts
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Comparing representations in Chinese information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
PAT-tree-based keyword extraction for Chinese information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Automatic categorization of case law
Proceedings of the 8th international conference on Artificial intelligence and law
Machine Learning
Classifying criminal charges in chinese for web-based legal services
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Hi-index | 0.02 |
Phrases provide a better foundation for indexing and retrieving documents than individual words. Constituents of phrases make other component words in the phrase less ambiguous than when the words appear separately. Intuitively, classifiers that employ phrases for indexing should perform better than those that use words. Although pioneers have explored the possibility of indexing English documents decades ago, there are relatively fewer similar attempts for Chinese documents, partially because segmenting Chinese text into words correctly is not easy already. We build a domain dependent word list with the help of Chien's PAT tree-based method and HowNet, and use the resulting word list for defining relevant phrases for classifying Chinese judicial documents. Experimental results indicate that using phrases for indexing indeed allows us to classify judicial documents that are closely similar to each other. With a relatively more efficient algorithm, our classifier offers better performances than those reported in related works.