The Risk Management of Data Science
Actuaries have been using data science techniques for years. While the statistical methods are not new, there is now exponentially more computing power available. New technologies come with new risks. Thankfully, the theoretical concepts tested in actuarial exams have also prepared you to navigate the following data science pitfalls.
Biased data create biased algorithms
If a machine-learning algorithm is trained using biased data, it is going to produce biased results. For example, Reuters reported that Amazon recently scrapped a resumé screening tool because it discriminated against women. Based on resumés submitted to Amazon, the tool was reportedly less likely to recommend a resumé that included the word "women's" on it (e.g., "Women's Soccer Team"). An algorithm is expected to be more objective than a human. As it turns out, the algorithm is subject to the same influences that lead to biases as humans. There is a danger that by tending to the average, it results in underfitted models trained to pick the status quo.
Ask the right question and listen to the data
With so many datasets available, the issue at hand is less about finding the answer and more about asking the right questions. Consider the following story, taken from Robot Vision by Berthold K.P. Horn:
A Fairy Tale
Once upon a time there were two neighboring farmers, Jed and Ned. Each owned a horse, and the horses both liked to jump the fence between the two farms. Clearly the farmers needed some means to tell whose horse was whose.
So, Jed and Ned got together and agreed on a scheme for discriminating between the horses. Jed would cut a small notch in one ear of his horse. Not a big, painful notch, but one just big enough to be seen. Well, wouldn't you know it, the day after Jed cut the notch in horse's ear, Ned's horse got caught on the barbed wire fence and tore his ear the exact same way! Something else had to be devised, so Ned tied a big blue bow on the tail of his horse. But the next day, Jed's horse jumped the fence, ran into the field where Ned's horse was grazing, and chewed the bow right off the other horse's tail. Ate the whole bow!
Finally, Jed suggested, and Ned concurred, that they should pick a feature that was less apt to change. Height seemed like a good feature to use. But were the heights different? Well, each famer went and measured his horse, and do you know what? The brown horse was a full two inches taller than the white one!
The moral from the above story is (as stated by the author): "When you have difficulty in classification, do not look for ever more esoteric mathematical tricks; instead, find better features." When implementing analytics, the focus should be on listening to the data. Too often, data science is misused by massaging the data or overfitting a model until it confirms the desired answer.
Past CAS President Brian Brown has described actuaries as the original data scientists — from using credibility theory to incorporate data, to arriving at results that are not unfairly discriminatory, to considering outcomes that may not be present in historical data. Actuaries have a professional duty to ensure decisions are based on reliable data. You can embrace data science and avoid its pitfalls by approaching it with the same professionalism.