Improved variable and value ranking techniques for mining categorical traffic accident data

Authors:
Huanjing Wang;Allen Parrish;Randy K. Smith;Susan Vrbsky
Affiliations:
Department of Computer Science, The University of Alabama, Box 870290, Tuscaloosa, AL 35487-0290, USA;Department of Computer Science, The University of Alabama, Box 870290, Tuscaloosa, AL 35487-0290, USA;Department of Computer Science, The University of Alabama, Box 870290, Tuscaloosa, AL 35487-0290, USA;Department of Computer Science, The University of Alabama, Box 870290, Tuscaloosa, AL 35487-0290, USA
Venue:
Expert Systems with Applications: An International Journal
Year:
2005

Citing 15
Cited 1

A Method for Attribute Selection in Inductive Learning Systems

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Class prediction and discovery using gene expression data

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper

Proceedings of the Twelfth International Florida Artificial Intelligence Research Society Conference
An introduction to variable and feature selection

The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Ranking a random feature for variable and feature selection

The Journal of Machine Learning Research
Use of the zero norm with linear models and kernel methods

The Journal of Machine Learning Research
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining

IEEE Transactions on Knowledge and Data Engineering
Optimizing disk storage to support statistical analysis operations

Decision Support Systems
Variable selection and ranking for analyzing automobile traffic accident data

Proceedings of the 2005 ACM symposium on Applied computing

Managing communicable diseases using an agile information framework

Telehealth/AT '08 Proceedings of the IASTED International Conference on Telehealth/Assistive Technologies

Quantified Score

Hi-index	12.05

Visualization

Abstract

The ever increasing size of datasets used for data mining and machine learning applications has placed a renewed emphasis on algorithm performance and processing strategies. This paper addresses algorithms for ranking variables in a dataset, as well as for ranking values of a specific variable. We propose two new techniques, called Max Gain (MG) and Sum Max Gain Ratio (SMGR), which are well-correlated with existing techniques, yet are much more intuitive. MG and SMGR were developed for the public safety domain using categorical traffic accident data. Unlike the typical abstract statistical techniques for ranking variables and values, the proposed techniques can be motivated as useful intuitive metrics for non-statistician practitioners in a particular domain. Additionally, the proposed techniques are generally more efficient than the more traditional statistical approaches.