Understanding search engines: mathematical modeling and text retrieval
Understanding search engines: mathematical modeling and text retrieval
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Generative model-based clustering of directional data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Non-negative Matrix Factorization with Sparseness Constraints
The Journal of Machine Learning Research
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Hi-index | 0.00 |
Most complex aerospace systems have many text reports on safety, maintenance, and associated issues. The Aviation Safety Reporting System (ASRS) spans several decades and contains over 700 000 reports. The Aviation Safety Action Plan (ASAP) contains over 12 000 reports from various airlines. Problem categorizations have been developed for both ASRS and ASAP to enable identification of system problems. However, repository volume and complexity make human analysis difficult. Multiple experts are needed, and they often disagree on classifications. Even the same person has classified the same document differently at different times due to evolving experiences. Consistent classification is necessary to support tracking trends in problem categories over time. A decision support system that performs consistent document classification quickly and over large repositories would be useful. We discuss the results of two algorithms we have developed to classify ASRS and ASAP documents. The first is Mariana--a support vector machine (SVM) with simulated annealing, which is used to optimize hyperparameters for the model. The second method is classification built on top of nonnegative matrix factorization (NMF), which attempts to find a model that represents document features that add up in various combinations to form documents. We tested both methods on ASRS and ASAP documents with the latter categorized two different ways. We illustrate the potential of NMF to provide document features that are interpretable and indicative of topics. We also briefly discuss the tool that we have incorporated Mariana into in order to allow human experts to provide feedback on the document categorizations.