In the latest in our series of blogs focusing on data transformation in General Insurance we explore how to approach the subject of data architecture within the context of a broader data transformation programme.
So, should you implement Hadoop, Amazon Redshift, Azure Data Warehouse or SQL Server?
This is, in essence, entirely the wrong question to be answering when commencing any successful transformation, but especially in the data arena where the generally open nature of these platforms means that a technology-led approach presents a real risk of embarking on an expensive project only to be left searching for a problem to solve.
You need to start with clarity in the business outcome desired and, whilst the challenge will likely be unique to each organisation, insurers who have delivered successful data transformation have always had clarity on:
- The problem to solve
- Why it needs solving, and for whom
- Separation between business capability to be delivered and technology platform needs.
In terms of business capability, there are fundamentally, only two use cases for data solutions to satisfy (albeit with myriad specific requirements within them):
- Facts – the data provides an accurate representation of what has happened in the business and can provide trustworthy information on business performance to support decision such as policies in force, total premiums by exposure or claims paid in last quarter
- Insights – the data provides support for projections and testing hypotheses through establishing patterns and correlations.
All of the platforms mentioned above can meet both of these use cases. However, the inherent challenges are very distinct:
- To provide accurate and trustworthy facts the data needs to be well understood and a level of standardisation is required across data sources
- Hypotheses testing or predictive modelling is computationally complex and will often require large-scale data sets to provide representative results.
Therefore, tension between consistently interpreting data and a need for significant scale in terms of storage and compute power, when looked at as a single solution, can point to a need for a major platform to solve both at once.
“But just get all the data in a Data Lake, right?”
No. A platform choice does not inherently solve a problem. Data warehouses and big data solutions are complimentary and can co-exist, on appropriate (and separate if needed) platforms. Therefore, while the overall architecture should be designed in advance, delivering each fundamental outcome can be tackled separately and the data architect can focus on ensuring that projects deliver solutions that are:
- Deliverable – the most compelling solution on paper means nothing if the organisation and its partners do not have the capability to execute
- Accessible – data locked away in a shiny box is no use. It needs to be accessible by users and applications
- Timely – data might need to be available in real time, intra-day or daily. However waiting until weekend or month end batches is no longer acceptable
- Extensible – start with a solid foundation. New requirements will always appear, what is important is the ability to deliver quickly with quality, trusted data, and add tools or functionality rapidly
- Scalable – some workloads are very spikey; if the ability to scale and contract storage and computational power is critical then build the capability early
- Secure – never compromise the ability to secure and protect the data. The reputational and financial impact from a data breach can be huge.
For our thoughts on data transformation and the Underwriting and Pricing function watch out for the next blog in our series.