http://www.risk.net/insurance-risk/feat ... tured-data
Insurance risk managers draw on unstructured data
Author: Clive Davidson
Source: Insurance Risk | 10 Apr 2014
Insurers operate in a world of unstructured and unsorted data with gigabytes of information at their disposal, some of which is barely used. The latest analytical tools, advances in computing power and a new breed of data scientists are helping firms make more use of this powerful resource in their risk management. Clive Davidson reports
Like the physical universe, the data universe is vast and expanding at an enormous rate. Already, total data volumes must be measured in zettabytes – that is trillions of gigabytes. And the amount of information that is being created globally is now doubling around every two years. While some of this data is stored in orderly collections residing in conventional databases, the majority is a more random assortment of text, audio, images and video. Only a tiny fraction of this data is analysed – some put it at around 1%.
Now, however, advances in analytical tools and computer power and a new discipline of ‘data science’ have emerged, which allow all the disparate data sources to be combined and investigated. The richness of the information pools and the power of the new tools not only bring a greater depth to traditional analysis, but also enable new factors and relationships to be discovered, and this is allowing insurers to undertake a more sophisticated analysis of risk as well as gain a greater insight into customer requirements and behaviour.
Traditionally, like most business data, insurance data is organised and stored in relational database management systems where the nature of the records, data fields and relationships between them are specified in advance – for instance, a policyholder database contains records with name, date, product type and premium amount. This structured information can then be accessed and searched using a structured query language such as the widely used standard SQL, which specifies what information is to be found where and how the results of the query should be ordered.
While relational databases and the SQL have provided the backbone for data management for insurance and other industries over the past few decades, they are not much use for things like business documents, news reports, scientific research, emails, social media exchanges and call centre records. Although vast quantities of such information are now available digitally, it has no inherent structure or consistency. Thus analysing this unstructured data requires specialised database systems and tools, as well as increased computing power due to the sheer volume of information.
During the past few years, there has emerged a wide variety of what are collectively known as ‘NoSQL’ databases, meaning they store and make accessible information using mechanisms other than relational tables. NoSQL databases use concepts such as key values, objects and documents to organise the information they hold. For example, MongoDB is a document-oriented database that is widely used by organisations such as eBay, Craigslist and The New York Times. The data in NoSQL databases is investigated using programming and analytical languages like Java, C++, Python and R.
Many of the new generation of tools for unstructured data were initially developed to enable search engines such as Yahoo and Google to tackle the vast resources of the web. Key among these is the Hadoop framework for the management and processing of large-scale disparate datasets on clusters of commodity hardware. Hadoop has a number of modules for such things as distributing data across groups of processors, filtering, sorting and summarising information, and automatically handling the inevitable hardware failures that arise in large computing grids. All of the technologies mentioned are open source, which means they are free and readily available, and they are also supported by many proprietary commercial extensions and equivalents.
The breakthrough with new data sources and tools is the ability to query things for which the data has not been organised in advance. This can reveal new patterns, trends and correlations that can be helpful in managing risk and spotting opportunities, says Neil Cantle, principal and consulting actuary at Milliman, based in London.
But the tools are just one part of exploiting the new data sources. Also required is a new breed of professionals (so-called data scientists) who have a skill set that combines expertise in the new data management and analytical tools with a deep understanding of the insurance industry and an ability to translate between the two. “Domain expertise is hugely important to avoid drawing inappropriate conclusions from the data due to a lack of knowledge about how the insurance industry works,” says Mike Wilkinson, a director of the insurance management consultancy at Towers Watson, based in London.
Eric Huls, senior vice-president of qualitative research and analytics at Illinois-based Allstate Insurance, agrees. “Data scientists need to have the domain knowledge to understand key business problems, the ability to translate those business problems into maths problems, the ability to obtain and perform analysis on data to solve those maths problems, and the ability to then translate that mathematical solution back into a business solution,” he says. ‘It’s a demanding, but incredibly valuable skill set.”
Allstate has been a pioneer in predictive analytics – the search for patterns and trends in large, structured datasets. Predictive analytics has been used widely by insurers since the late 1990s for pricing and underwriting in areas where there are large volumes of frequently updated information, such as auto or property insurance.
“While we’re still using predictive analytics for pricing and underwriting, we’re light years ahead of where we were in the ’90s, both because of the amount of data we have at our disposal, and the explosion in computing power that allows us to handle the influx of data in ways that simply weren’t technologically feasible then,” Huls says. Analyses that would have taken the company weeks are now being done in minutes. “And the ability to iterate quickly allows you to learn so much faster than before,” he says.
Massachusetts-based MassMutual is another company that is pushing on from predictive analytics into the realm of unstructured data. Over the past six months, the company has hired five data scientists and plans to grow its team further over the course of the year. “Insurance is a highly regulated industry and, as a result, the general tendency is to capture and gather tremendous amounts of information both in structured and unstructured formats. Now with the existence of tools that can integrate and utilise all of this information, the industry is starting to be predictive and prescriptive in nature as well,” says Amit Phansalkar, vice-president and chief data scientist at MassMutual.
One of the first applications of the new generation of data management tools in the industry is the emergence of telematics motoring insurance, where a policyholder’s driving is monitored and the information relayed to the insurer, who adjusts premiums accordingly. Also known as usage-based insurance or pay-as-you-drive, telematics-based insurance is offered by Allstate in the US, Admiral, Ingenie and Co-operative Insurance in the UK, and many others elsewhere. Although telematics involves conventional structured data where insurers specify what they want to know and how they want to receive information in advance, it differs from other insurance applications in the volume of the data and the rate at which it is captured.
Currently, raw data is captured and pre-processed by the telematics device providers, who send summary data to the insurer. But this is likely to be an intermediary stage as telematics smartphone applications develop and manufacturers begin to equip their vehicles to be directly and continuously connected to the internet, making a broader range of data about the driver and vehicle accessible to insurers, says Sue Forder, associate partner and executive architect, business analytics and optimisation, IBM Global Business Services, who is based in London.
The next step is to combine conventional database content, such as policyholder records, with unstructured sources. This could be helpful in areas such as life insurance where there are often few interactions with customers over the life of a policy, so little structured data within which to look for clues for such things as potential lapse or exercise of annuity options. The new technologies enable companies to bring in other relevant internal but unstructured information, such as servicing reports from call centres. Firms are able to further combine this with external data that might suggest changes in a policyholder’s lifestyle, behaviour or financial status, which could influence lapse or option exercise – for example, information about wages or supermarket spending or gym membership related to the policyholder’s occupation or residential area.
“[The new data capabilities] enable insurers to look more broadly and deeply into the world in which the policyholder lives without necessarily being specific about the person, and allow them to start making inferences about an individual and their behaviour,” says Cantle.
Another potential application is to look for early-warning indicators for conduct or reputational risk. “At the moment, it can be years before firms realise they have built up problems with things like misselling, whereas it might be possible to spot signals sooner by monitoring sources of data, such as social networks, and drawing links between them,” says Wilkinson.
One area in which the full power of new technology is already being applied is liability catastrophe modelling. US-based technology company Praedicat has recently launched a platform for unstructured data and assembled a team of data scientists to monitor medical and scientific research for early indicators of emerging problems that might turn out to be the equivalent of asbestos or tobacco (see Insurance Risk, April, page 35). Another area where new technology is enabling better analysis of large volumes of data is policy-by-policy modelling (see box).
But there can also be dangers in the analysis of the new data sources and the way the analysis is used.
“The biggest risk is not in getting the maths wrong, because it’s rare that people who aren’t sufficiently skilled are actually doing the maths. The risk is on either side of the maths – either making inappropriate assumptions or setting up the problem incorrectly up front, or misinterpreting or misusing the results. These risks are doubly important to avoid because they not only can lead to bad results, they can make decision-makers distrustful of good analytics,” says Huls of Allstate.
Phansalkar at MassMutual agrees: “The biggest challenge in the use of [the new data capabilities] is the propensity to be lost in technology innovation without realising the full potential of the data. Big data can provide meaningful solutions only if relevant questions are being asked.”
A further challenge is the investment required, both in infrastructure and skills. Data science is a relatively new field for insurers and qualified and experienced data scientists with insurance industry knowledge are rare. For infrastructure, cloud computing offers a low cost, pay-as-you-go alternative to buying in systems, which could be helpful to an industry that is historically conservative in IT spending, says Cantle.
Meanwhile, there are always security issues with data, says Forder of IBM, and the more data a company collects, the bigger the security challenge. There can also be ethical and regulatory issues. “People don’t like companies sniffing around and looking at information sources that give away more about them than they are comfortable with,” says Cantle. The EU has proposed reform to data protection laws that, among other things, will give citizens easier access to data held on them, more control of how it is used and ‘a right to be forgotten’ – the ability to demand that data on them is deleted when there are no legitimate grounds for retaining it. All of this means insurers will need to treat the new data sources and capabilities with caution.
Early adopters claim they are already getting benefits from the new technology. Allstate says its sophisticated data capabilities enable the company “to solve problems and make decisions across a number of functions including pricing, underwriting, marketing, claims settlement processes and operations”. MassMutual’s Phansalkar adds that the new data capabilities provide the basic materials and tools to obtain a better and quicker understanding of the increasingly complex and competitive world in which insurers operate.
In the end, the ability that new data technology brings to respond rapidly to the market could be its most significant benefit, says Wilkinson of Towers Watson. “In highly competitive and dynamic markets such as the UK, marginal pricing advantages can quickly generate profits, whereas if you are not keeping pace and are slightly mispricing risk, the detriment can be hugely disproportionate and happen quickly. It is an arms race, but one insurers might not be able to avoid.”
Box: Policy-by-policy modelling
Modelling large, complex liability portfolios, especially of life products, presents a considerable computational challenge. To complete calculations in a time frame that is useful for risk management, asset and liability management (ALM) or capital calculations, many insurers use proxy methods, such as replicating portfolios, curve fitting or least squares Monte Carlo. However, in using such approximations there is a trade-off with accuracy and flexibility. Therefore, insurers have been looking for ways to make full policy-by-policy Monte Carlo-based modelling more tractable. New techniques and technologies are providing the solution.
“Monte Carlo simulation offers more complete information since all variables carry full probability distributions. Results are more reliable due to reduced model risk. When asset-side valuations and hedging strategies [are incorporated] into the model, one model can apply to many tasks – financial planning, actuarial modelling, product development, ALM and risk management,” says Timo Penttilä, partner, insurance and asset management services, at Helsinki-based software vendor Model IT.
Model IT has developed a simplified method of building models that uses rules and objects rather than the conventional algorithm and procedural approach. The method is encapsulated in a modelling language that interprets users’ high-level rules and objects (such as cashflows or claims payments) into MatLab mathematical code. The company’s cFrame modelling platform takes this code and optimises its execution using vector processing – a supercomputing technique for operating on arrays of data to solve large, complex mathematical problems – and algorithms that eliminate unnecessary duplication of data or calculations.
In tackling the challenge of policy-by-policy modelling US-based SunGard enables users of its iWorks Prophet actuarial software to group similar policies (for example, by occupation or age band) and define small, repeatable loops of code that can be executed across a grid of processors. Prophet Enterprise, the production environment for the execution of Prophet models, contains a sequencing engine to schedule the code snippets efficiently, and can run models across thousands of core processors. It also contains a scheduler to maximise the use of the available resources between the computational tasks, says John Winter, director of product management for iWorks Prophet for SunGard based in London.
Even with these optimisation techniques, policy-by-policy modelling of large complex portfolios remains a time-consuming computational challenge – which is where cloud computing comes in. A range of cloud services are now available from companies such as Microsoft, Amazon and IBM.
“Distributed cloud computing (DCC) is a key for fast responses with large data sets. DCC offers hundreds or thousands of workstations to be used in parallel inside a cloud service. DCC offers high computational power at low cost; you only pay for what you use and not for idle hours,” says Penttilä of Model IT. SunGard offers its own cloud service, with the ability to optimise resource configuration for Prophet-based modelling.
The need for high performance policy-by-policy modelling is only likely to increase, says SunGard’s Winter. “As self-assessment regulations, such as the Own Risk and Solvency Assessment, take hold, they increase the tendency for managers to ask more difficult questions about the risks they face, which may call the validity of proxies into question,” he says.