Exams IRL: MAS II
The purpose of the “Exams in Real Life” series is to share how content from CAS exams is used in the workplace today. In essence, we would like to supply a little motivation by answering “Why am I learning this stuff?” and “When am I ever going to use any of it?” If you have not already done so, please take a moment to read our prior articles (see Exams IRL Archive below).
For this issue, we’re focusing on MAS II – Modern Actuarial Statistics II.
The material on MAS II introduces candidates to a powerful list of skills!
The efficacious nature of this exam’s material is in the name: MODERN. The main reason why the actuarial career is so dynamic is that we’re always trying to progress. With the innovations in data storage capabilities and computing power, our analysis can/must advance. Improved predictive analytics and the benefits of Bayesian thinking are making actuarial estimates more accurate and granular. We are in the middle of an actuarial renaissance! MAS II, along with MAS I, the Intro to Data & Analytics course, and the upcoming Predictive Analytics module help prepare candidates for the future of actuarial analysis.
MAS II is split into four beneficial topics: credibility, linear mixed models, statistical learning and time series. Let’s discuss!
Introduction to credibility
Credibility is a time-honored actuarial practice. Often, we are making actuarial estimates, using internal data that may be too small or too noisy to consider fully statistically significant, particularly as we attempt to predict at more granular levels. When this is the case, we would like to weight our internal estimates with a related, significant value or underlying assumption of data behavior. This weighted estimate provides a credible number to use in actuarial methods, such as indications. For example, we may feel our data is too thin to fully rely on an estimate of expected losses for use in a state indication, so we may want to credibility-weight the estimate with a countrywide expected loss, putting 40% weight on our state loss estimate and 60% on the countrywide. The resulting credibility-weighted expected loss will be a more stable number to use in our indication.
MAS II discusses how we decide the weights (Z) to apply when calculating a credibility-weighted estimate. The classical technique (limited fluctuation) is currently used for the ratemaking methods in most lines of business. While simple, it is mostly accurate. The method may cause inaccuracies, however, in the fringes (e.g., super safe policyholders and/or super risky policyholders). For improved accuracy, actuaries are increasingly delving into Bayesian methods. MAS II discusses Buhlmann, Buhlmann-Straub and general Bayesian procedures. These methods allow one to make assumptions of how the nature of our data might behave, then apply internal estimates to these assumptions, either by calculating Z or by creating a predictive distribution to use for our estimates going forward. Understanding and applying these Bayesian-based methods will be vital in improving actuarial expected frequency, severity and aggregate loss estimates, in particular. Better predictions on a more granular scale lead to favorable selection and fairer premiums for policyholders.
Linear mixed models
To charge fair premiums to our policyholders, we need to be able to differentiate between them in a credible manner. Classifying risks into homogenous groups for a series of selected policyholder characteristics and calculating the relative average expected losses for each level allows us to create a multiplicative algorithm called a class plan. The purpose of the class plan is to calculate each policyholder’s fair premium, based on their individual characteristics. The more we split up the data into groups, however, the less statistically significant will be our predicted losses. This provides for an interesting dilemma: We want to classify our risks properly, but we also want credible estimates. As such, our groupings can’t be too granular. However, risks within each group are different from each other. Thus, we aren’t fully capturing the true classification. For this dilemma, we can benefit from the use of models, like linear mixed models (LMMs).
LMMs incorporate fixed and random effects, meaning we can model the relationship between a set of independent variables and the response variable (e.g., policyholder characteristics versus expected loss), while also modeling the differences between risks within each independent variable. Along with other tools, like generalized linear models, LMMs help us differentiate our policyholders more effectively in our class plans. LMMs can also be used in book of business analysis, planning and more.
Statistical learning
Along with LMMs, there is a plethora of helpful statistical tools to improve our actuarial methods and analysis. MAS II introduces many of these, including k-nearest neighbors, decision trees (CART), random forest, gradient boosting machines (GBM), principal components analysis (PCA), clustering (k-means, hierarchical), neural networks, along with summary statistics interpretation.
These tools are used for a variety of purposes. For example, tree-based methods and PCA are used to help identify significant variables for use in models (e.g., which policyholder characteristics are most predictive of expected loss). Clustering methods are used to help identify splits between levels within a variable (e.g., territories in a state).
The ability to interpret statistics (such as p-values, AIC, R-squared, Gini, lift charts, etc.) is necessary for understanding variable significance and model predictive accuracy. We must decide which models are best for actuarial estimation with confidence that the selection of variables and the resulting predictions are trustworthy.
Time series
Much of the data used for actuarial estimation has a time component, where prior data points naturally lead to subsequent data points (i.e., the data is correlated through time). This autocorrelation, along with long-term trend and seasonality components, provide helpful patterns in predicting future behavior.
Time series models help to identify these patterns. MAS II will teach you the autoregressive integrated moving average (ARIMA) model in particular. The “AR” component models the autocorrelation via regression, the “I” component supplies a sort of data transform (differencing/integration), which implicitly models seasonality and other effects, and the MA component models a long-term trend via a moving average.
ARIMA models are underutilized! They have powerful possibilities in trend selections, such as premium, loss, exposure and expense trends. ARIMA models can be used to identify drivers of change. They can be used in book of business analysis, planning and much more.