Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Categorization Based on Regularized Linear Classification Methods
Information Retrieval
ECML '93 Proceedings of the European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Distributional word clusters vs. words for text categorization
The Journal of Machine Learning Research
Incorporating prior knowledge with weighted margin support vector machines
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Active Learning with Feedback on Features and Instances
The Journal of Machine Learning Research
Learning from labeled features using generalized expectation criteria
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Identifying patent monetization entities
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Law
Hi-index | 0.00 |
We investigate the problem of binary text classification in the domain of legal docket entries. This work presents an illustrative instance of a domain-specific problem where the state-of-the-art Machine Learning (ML) classifiers such as SVMs are inadequate. Our investigation into the reasons for the failure of these classifiers revealed two types of prominent errors which we call conjunctive and disjunctive errors. We developed simple heuristics to address one of these error types and improve the performance of the SVMs. Based on the intuition gained from our experiments, we also developed a simple propositional logic based classifier using hand-labeled features, that addresses both types of errors simultaneously. We show that this new, but simple, approach outperforms all existing state-of-the-art ML models, with statistically significant gains. We hope this work serves as a motivating example of the need to build more expressive classifiers beyond the standard model classes, and to address text classification problems in such non-traditional domains.