In “Global Risk – is Software the ‘Vlieg in de soep’?” we first flagged a potential new threat to the economy and society. In addition to threats from war, volcanoes, global heating, we could see that our digital society exposed us to a less remarked upon threat. This is the threat from software failures.
We have since developed this thinking further, including through a joint round table with the National Preparedness Commission (NPC). The NPC is tasked to promote better preparedness for a major crisis or incident. The report from the roundtable concluded that “The software element of digital systems failure is a cost to economy and society which will only increase as software has become a utility, is in wider usage, and more vulnerable to failure.”
In addition to the threat from catastrophic software failures, we also started to think about their insidious effect.
It is well known that the UK’s productivity and productivity growth has lagged the US and – for most of the last two decades – that of the eurozone. While it is only human to look for a silver bullet such as AI, here we suggest measures that would underpin the silver bullet if and when it arrives, and could also increase productivity in the meantime.
The authors have been exploring the risks to UK productivity from software failure for the last two year through a BCS Working Group. Their report “Digitalisation – software risk and resilience – a policy think piece” compared the cost of software failures to the economy with that of road accidents, and found them to be comparable.
Software is now pervasive, and software services are now delivered through complex tightly coupled systems, with unpredictable failure modes. This requires new approaches to the measurement, mitigation and management of software risk and resilience.
The Department for Culture, Media & Sport (DCMS), with the Department for Software, Innovation and Technology (DSIT) issued a “Call for views on software resilience and security for businesses and organisations” in February 2023. The BCS response to the call emphasized that digital – software – failures are already impacting the productivity of the UK. Many services in sectors not thought of as digital are dependent for their delivery on software – from ecommerce to entertainment to government services. Productivity, in economics, measures output per unit of input, such as labour, capital, or any other resource. Software failures reduce the number of available hours and/or value of hours worked by users of digitally enabled services.
One of the recommendations of the BCS response was that government could lead on sharing information about breaches in digital services. Metrics for this are defined in the Network and Information Systems (NIS) Directive and Regulation for Regulated Data Service Providers (RDSPs). The four metrics are availability (user hours lost); integrity, authenticity, or confidentiality (user data compromised or services delivering wrong information); risk (to health, safety, or life); material damage to users (financial impact).
This “governance by accident” model is also being proposed for AI systems. Accidents could be defined as breaching service levels using a set of metrics as above. This approach measures the impact (rather than exploring the technical cause) of the failure. The impact approach is being actively considered by the insurance industry, as the legal costs of proceedings to determine the root cause of failure - or whether it was state sponsored or private hacking - mount up over years.
There are complications in the impact approach, but there are precedents. The DCMS’s publishes the names of RDSP organisations and fines levied for loss or unauthorised access to customer data. As a regulator, they are empowered to require reporting of breaches over a threshold value of the metrics. It is also suggested that regulators for Other Essential Services (Energy – electricity, oil and gas, transport – air, rail, water and road, health – healthcare settings (including hospitals, private clinics and online settings), water – drinking water supply and distribution and digital infrastructure – TLD (top-level domain) name registries, DNS (domain name systems) service providers and IXP (Internet exchange point) operators) should impose similar requirements.
Could a “governance by accident” approach to reporting on resilience improve other factors which are affecting UK productivity?
“User hours lost” came to mind when one of us was recently caught up in a blockage of the M1 and was delayed by two hours. The cause was apparently a lorry breaking the safety fence on a cross-bridge and hanging in the air over the motorway. A delay of two hours for all the stranded vehicles translates into a likely loss well in excess of the 750,000 user hours in the NIS Directive definition. However, a quick look at the Department for Transport website does not appear to report on user hours lost, though the impact of loss of productive time is surely an important contributor to the UK’s low productivity.
As the UK looks to invest in strategic areas of science and technology, perhaps measuring and tackling digital and physical infrastructure resilience factors which are holding back UK productivity, should be explored in parallel.
The framework above would seem to provide a possible starting point. It provides a language for describing the lack of resilience of infrastructure that we all know is cutting UK productivity off at the knees. Further, becoming known for reliable digital services in our new complex environment would add to the UK’s global competitiveness.