Hybrid singular value decomposition: a model of human text classification

Authors:
Amirali Noorinaeini;Mark R. Lehto;Sze-jung Wu
Affiliations:
School of Industrial Engineering, Purdue University, West Lafayette, IN;School of Industrial Engineering, Purdue University, West Lafayette, IN;School of Industrial Engineering, Purdue University, West Lafayette, IN
Venue:
Proceedings of the 2007 conference on Human interface: Part I
Year:
2007

Citing 5
Cited 0

Noise reduction in a statistical approach to text categorization

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Using linear algebra for intelligent information retrieval

SIAM Review
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating the Utility of Statistical Phrases and Latent Semantic Indexing for Text Classification

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
A commonsense approach to predictive text entry

CHI '04 Extended Abstracts on Human Factors in Computing Systems

Quantified Score

Hi-index	0.03

Visualization

Abstract

This study compared the accuracy of three Singular Value Decomposition (SVD) based models developed for classifying injury narratives. Two SVD-Bayesian models and one SVD-Regression model were developed to classify bodies of free text. Injury narratives and corresponding E-codes assigned by human experts from the 1997 and 1998 US National Health Interview Survey (NHIS) were used on all three models. Using the E-code categories assigned by experts as the basis for comparison all methods were compared. Further experiments showed that the performance of the equidistant Bayes model and regression model improved as more SVD vectors were used for the input. The regression model was compared to a fuzzy Bayes model. It was concluded that all three models are capable of learning from human experts to accurately categorize cause-of-injury codes from injury narratives, with the regression-based model being the strongest, while all were dominated by multiple-word fuzzy Bayes model.