A Practical Approach to Variable Selection — A Comparison of Various Techniques

Abstract
Selecting a useful list of variables for consideration in a predictive model is a critical step in the modeling process and can result in better models. Sifting through and selecting from a long list of candidate variables can be onerous and ineffective, particularly with the increasingly wide variety of external factors now available from third-party providers. This paper explores a variety of variable selection techniques, applied to frequency and severity models of homeowner insurance claims, developed on a dataset with over 350 initial candidate variables. The techniques are evaluated using multiple criteria, including the predictive power of a resulting model (measured using out-of-sample data) and ease of use. A method based on Elastic Net performs well. Random selections perform as well as some more sophisticated methods, for sufficiently long shortlists.

Keywords: variable selection, frequency and severity models, homeowners, Elastic Net regularization

Volume
Summer
Page
1-20
Year
2015
Categories
Financial and Statistical Methods
Loss Distributions
Frequency
Financial and Statistical Methods
Loss Distributions
Severity
Business Areas
Homeowners
Publications
Casualty Actuarial Society E-Forum
Authors
Alessandro Santoni