Text classification in a hierarchical mixture model for small training sets
Proceedings of the tenth international conference on Information and knowledge management
Learning Belief Networks in the Presence of Missing Values and Hidden Variables
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
The VLDB Journal — The International Journal on Very Large Data Bases
Combining clustering and co-training to enhance text classification using unlabelled data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
CBC: Clustering Based Text Classification Requiring Minimal Labeled Data
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
On Using Partial Supervision for Text Categorization
IEEE Transactions on Knowledge and Data Engineering
Stemming and lemmatization in the clustering of finnish text documents
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Is linguistic information relevant for the classification of legal texts?
ICAIL '05 Proceedings of the 10th international conference on Artificial intelligence and law
Expert Systems with Applications: An International Journal
Proceedings of the 24th international conference on Machine learning
On the relative hardness of clustering corpora
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Evaluation of internal validity measures in short-text corpora
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Learning the dimensionality of hidden variables
UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence
The Bayesian structural EM algorithm
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Hi-index | 0.00 |
This study explores the use of machine learning in case law search in electronic trials. We clustered case law documents, automatically generating classes to a categorizer. These classes are used when a user uploads new documents to an electronic trial. We selected the algorithm TClus, created by Aggarwal, Gates and Yu, removing its document/group discarding features and adding a cluster division feature. We introduced a new paradigm "bag of terms and law references" instead of "bag of words" by generating attributes using a law domain thesaurus to detect legal terms and using regular expressions to detect law references. We clustered a case law corpus. The results were evaluated with the Relative Hardness Measure (RH) and the ρ-Measure (RHO). The results were tested both with Wilcoxon's Signed-ranks Test and Count of Wins and Losses Test to determine their significance. The categorization results were evaluated by human specialists. We compared true/false positives against document similarity with the centroid, cluster size, quantity and type of the attributes in the centroids and cluster cohesion. The article also discusses attribute generation and its implications to the classification results.