The nature of statistical learning theory
The nature of statistical learning theory
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
A statistical learning learning model of text classification for support vector machines
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Estimating the Generalization Performance of an SVM Efficiently
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Hi-index | 0.00 |
This study explores a system to retrieve and classify the reasons for late mandatory SEC (Securities and Exchange Commission) filings. From the source documents, the system identifies the reasons for the late filing and classifies them into one or more of seven categories. The system can be used by potential investors who have to track a large number of filings concentrated within a day or two. Our results indicate that the SEC filings may be quite ambiguous, with experienced raters disagreeing on one category for a training sample of 600 filings in about 30% of the cases. However, allowing classifications into more than one category using document level information yields accuracy of about 90% in a test sample of 200 filings. We also show that the stock market reactions to over 9,000 late filings vary in an intuitive way according to the classified reasons.