Both of us – the co-authors – spent several years designing, coding and testing software. We have a visceral understanding of what can go wrong, even before the human element – the user – is added to the mix. And then we have the nightmare of finding that the data the system relied on was not available, faulty, or incomplete.
Any system depends on having an algorithm which is appropriate and complete; an accurate implementation; and reliable data.
An example is a system for making soup. The algorithm is the recipe: hopefully it has been tested. The implementation could be by an experienced cook or a robot (though if a robot is to make the soup, a much more detailed set of instructions are needed). The type of soup being made will depend on the ingredients –and how it tastes depends on the type and quality of the indgredients.
Software is the same. An algorithm is designed to fulfil the system’s purpose. The algorithm may be imperfect because it fails to cover some of the conditions or does not reflect the real world. It is executed in code which may be an imperfect implementation. The outputs are based on the data used by the software. Problems with utility of data are so widespread that a term “GIGO” – garbage in, garbage out – is well known.
Can this illuminate the utility of a software model to influence government policy during early stages of the Covid-19 pandemic? The algorithms were based on assumptions about the rate and type of infection, and the threat to human life from the virus. These turned out to be false. The implementation in code was suspect according to a number of observers. The available data input was incomplete and misleading: during early stages of the pandemic, those with suspected symptoms were told to not call their GPs or the NHS helpline unless they needed an ambulance. As a result, for instance, the official statistics showed 3 cases in West Berkshire while I was personally aware of 8 people with all the symptoms. Hence, the government scientists and politicians were depending on misleading projections which were based on inadequate algorithms, code implementation and data.
As our dependence on IT increases – accelerated by Covid-19 – so does the number of stories of software failures harming people and costing lives. For instance:
These and countless other examples from airlines, banks, Facebook, and across the commercial and government spectrum, suggest that software engineering standards for development (design and coding) and testing are widely ignored (where they even exist). Software is a new discipline – as the British Computer Society describes, published standards cover parts of the software life cycle, but not all.
It could be argued that lack of application of software engineering standards will be remedied over time through market forces, such as acceptable insurance rates for organisations with a bad track record of software failure. Failures are often what drives innovation. Successful innovation builds on an understanding of both how to improve the design and implementation and understanding the causes of failure.
In the cases above, people were able to recognise and remedy the errors caused – albeit after deaths or financial loss. The next generation of software systems is likely to outstrip the ability of humans to check the results in detail. Many AI-assisted systems are currently subject to human verification – for instance, the auto-correct feature on text messages, teaching assistants, customer service robots, autonomous vehicles.
However, AI systems are increasingly being used in circumstances where there is no human ability to check the logic or to query the outcomes. For instance,
In these cases, the built in weaknesses of the software are not able to be checked*. We described above three examples of errors of the type which persist in many released software systems: errors of algorithm, implementation and data. These must also be expected in AI systems.
Many experts are also concerned that society’s growing reliance on algorithms (many of which that we only vaguely understand) is problematic. One of the fundamental challenges of machine learning is that the models depend on data supplied by humans. Unfortunately, this data is likely to have been selected according to biases. It is not just the data. Algorithms are developed and implemented by people, and people have in-built biases. For example, we need to think about how and where the machine learning algorithms used today in healthcare, education, and criminal justice, are making biased judgements. And without a mechanism for querying the outcomes.
I have been reading Robert Harris’ The Second Sleep, about an England several centuries after the collapse of our current society. An ancient artifact is discovered, written ten years before the collapse, identifying “six possible scenarios that fundamentally threaten the existence of our advanced science-based way of life”: these are Climate change; nuclear exchange; super-volcano eruption, leading to rapidly accelerated climate change; asteroid strike, also causing accelerated climate change; general failure of computer technology due to either cyber warfare, an uncontrollable virus, or solar activity; pandemic resistant to antibiotics.
The list is reasonably prescient, although the recent pandemic has focused attention on virus-led pandemics, rather than antibiotic resistance. But we think that – right up there – we need to include the risk of software malfunction.
So, yes, we do think that there is a fly in the soup. In fact, it is a huge and dangerous insect. We think that software is a problem flying just under the radar, ready to fall into the soup, leaving devastation in its wake. It could crash our planet. As we continue to depend on systems that are faulty; that we do not “understand”; and/or those that are based on data or assumptions that are incomplete or faulty, the danger increases.
Patricia Lustig and Gill Ringland, Fellow, British Computer Society, September 2020
* This issue is discussed in depth in Bob McDowell's contribution earlier this year: