Deploying machine learning in insurance pricing
19 April 2017
Insurers have been investigating the deployment of machine learning techniques in the pricing arena, seeking to exploit the speed at which models can be built and refreshed compared to the use of more traditional generalised linear modelling techniques.
Machine learning techniques offer significant advantages over these traditional models, including the availability of various types of non-linear models which can lead to a wide range of new insights. However, these new models are more difficult to explain to both brokers and to customers (especially when these more sophisticated models suggest significant changes compared to the expiring prices), and there is a degree of resistance to what might be viewed as a black box technology by management and marketing teams.
At the same time, the actual improvements offered by these new models over the traditional models may not be as marked as might be expected given the hype surrounding machine learning.
So how can these twin challenges be overcome; namely, deploying these new pricing models in a way that can be readily understood by all stakeholders, and secondly how to actually get the most out of these new types of model?
Tackling the first of these challenges, some of the more sophisticated machine learning platforms are now offering the ability to “convert” these new models in such a way as to mimic the existing models in production, where the rating factors are more readily explainable. This allows these new generation models to be put into production in a manner which fits existing and well understood processes. This also has the significant advantage of being more readily explainable to brokers and customers, particularly when the inevitable questions arise as to why premium rates have risen on renewal - naturally, customers and brokers are less anxious when prices have reduced. However, embedding these new models into the business will require both:
- a robust governance framework, as these new models may prove to be very disruptive in terms of the extent of differences in the perception of riskiness between old and new models for any given customer or insured risk. And so any particularly radical changes will need to be challenged and moderated, and may also need to be bled in slowly over time;
- some degree of change management, and the provision to frontline staff of the necessary tools to respond to broker and policyholder queries which will inevitably arise from the introduction of any new rating structures, particularly those that represent a significant shift from predecessor models.
In relation to the second of these challenges, it is a common myth that applying unsupervised or supervised machine learning techniques to the same dataset is going to yield vastly different and improved results. Typically, the real benefits arise from the fact that these new challenger models can be up and running in as quickly as 24 hours, if the same scrubbed dataset is available for use. So, the real improvement here is a significant reduction in the time elapsed from start to finish of a pricing study. To truly enhance risk segmentation and pricing models, however, extensive data augmentation is where the real battleground lies. Those insurers which are able to use big data, extract value from unstructured data they already have, use other valuable information on their customers from non-insurance sources, and eke out knowledge or information that their competitors cannot determine, will continue to enjoy an advantage over their peers.
Data augmentation is part art and part science, and many insurers have yet to truly tap into the full data and information at their disposal within both internal systems and paper-based information, such as policy documentation and claims files, which would include medical reports, loss adjuster reports and other specialist assessments, let alone utilise external data sets. Insurers have already started to benefit from the use of optical character recognition technology to digitise paper documents, creating new data sets which can be interrogated for valuable information. Some of the biggest advances have been made in relation to digitising case handlers’ notes and medical reports, as this will then allow predictive models to be built around eventual liability amounts, and also generate valuable information to help triage claims more effectively. In summary, to get the maximum benefit out of the deployment of machine learning tools, effective governance arrangements need to be in place to manage the transition to new rating structures, and appropriate data augmentation is critical to getting the most value out of these technologies and techniques.
In summary, to get the maximum benefit out of the deployment of machine learning tools, effective governance arrangements need to be in place to manage the transition to new rating structures, and appropriate data augmentation is critical to getting the most value out of these technologies and techniques.