Pissing In The Well
Epidemiology is the branch of medicine which deals with the incidence, distribution, and control of diseases. One of the legends of epidemiology surrounds John Snow and the Broad Street cholera outbreak of 1854. Broad Street was a district in Soho London (not the more famous and noble Broad Street Ward of the City of London where I am the Alderman), also known as Golden Square and today as Broadwick Street. 616 people perished during the outbreak.
Snow was determined to use scientific methods to analyse the epidemic while it was in progress. There were two contrasting theories of disease transmission - miasma theory and germ theory. Snow used mapping and statistics to identify the Broad Street water pump, one meter from a cesspit, as the source of the outbreak, and germ transmission as the most likely method of distribution. Snow's innovation was to compare death rates in areas served by two water companies that drew water from the Thames with the Broad Street pump death rates. Snow’s work persuaded parish authorities to disable the Broad Street pump by removing its handle. People of the parish were contaminating and consuming a water supply of their own waste. In terse terms, ‘pissing in the well’.
Like all legends it’s more complicated than that. Snow’s conclusions were trampled in the blame escape that followed the outbreak as the Board of Health ultimately attributed the 1854 epidemic to miasma. Still, events behind the legend led to the establishment of epidemiology as an area of medical science and epidemiologists as disease detectives.

The Well Of The Internet
When it comes to AI, are we pissing in the well? The Economist seems to think so. Impure drivel made The Economist choleric. "And The Economist’s word of the year for 2025 is…" (3 December 2025) 'slop'. "Slop merchants clog up the internet with drivel."
But it’s a big well, right? The major source of training data for most large language models (LLMs) is the vast ocean of the internet and world wide web. AIs ‘hallucinate’ or, perhaps more accurately, ‘lie’ on many occasions. As AIs are data-driven, the proximate cause of most bad results is bad training data. Other interactions matter too, from natural language prompting and multimodal inputs to defining context, specifying roles, and setting clear constraints, but the starting point is data. If the starting point is a cesspool of data, then the end result is going to be polluted.
Let’s take a fictional 16 year old AI user, call him Michael, and his first girlfriend, Elisabeth. Michael wants to impress Elisabeth. He remembers that she is a fan of William Goldman’s classic children’s novel, “The Princess Bride”. In a fit of inspiration he asks his favourite AI to generate a 200 page novel “Elisabeth The Princess” where he is Westley, she is Buttercup, and the school bully is cruel Prince Humperdinck. Michael rephrases that, 600 pages. “As you wish”, it is done. As Michelangelo must have noted (sic), "Every great masterpiece has its cheap knock-". Whatever Elisabeth may or may not think of Michael’s seduction tactics, 600 pages of that cheap knock-off are highly likely to be shared electronically or posted online or both, resulting in yet more rubbish contaminating source materials, in turn corrupting things further.
The corrupted well is here. Already in 2024 I found myself writing a short blog about a new book on artist William Alister Macdonald, the "Tahitian Turner”. I wanted a quick precis of this slightly obscure but delightful artist to help promote the book and turned to ChatGPT for three or four succinct sentences. What a palaver. Several repeated attempts and forty minutes of fact-checking later, Macdonald had had several different yet specific birth dates and death dates, had Australian, Scottish, English, British, or African nationalities, was good or bad, copied or was original, and clearly had some spelling issues with his own name. I gave up and did it myself, but not before noting the various versions vigorously vociferating their verdicts with vim.

Can You Spell Epidemiology Without An AI, Or What’s A Poor Water Carrier To Do?
As noted, the internet is a world wide well. Snow found analysing the water supplies of London a large task. Though apparently some AI firms already feel constrained with not having enough digital material to train their models. The AI evolution has relied on five factors, algorithms, chips, connectivity, energy, and data. Algorithms: fundamental approaches haven't progressed as much as people think—innovation is in application. Chips: an accidental gaming architecture that turned about to be appropriate and became standard. Connectivity: the most underappreciated enabler, transforming isolated experiments into global platforms. Energy: intense consumption has created business models where people sell paper in revenue-negative companies to subsidise grotesquely energy-expensive queries in hopes of building a sustainable, addictive, future market. Data: expect data, particularly in the non-static sense of flows of information, to become increasingly valuable, and increasingly ‘owned’ and protected.
At least three implications follow. First, there is a role for internet epidemiology, data detectives who track down and cut off the most offending noxious sources in near real time. Data management issues will increase in importance, a tall task that would drag this short article into oft trodden tracks of content enforcement, monitoring, validation, and editing. Discussions we must continue to pursue on data rights, censorship, and ownership.
Second, where there’s muck there’s brass. Money can be made from dirty, unpleasant, or difficult work. Data management offers great opportunities for companies and countries. For companies, there are products and services to sell that give customers a competitive edge in AI, their data management environment, cleansing, benchmarks, test sets, or sanctuaries.
Third, data-data-data. Control of information is where to seek an edge in AI. The shift from clean datasets to an internet increasingly polluted with AI-generated content poses existential challenges for AI. If the algorithms are open source, the chips are identikit, connectivity is ubiquitous, and energy is subsidised, then control of data is control of AI. AI opportunities will be ‘data-data-data’, intellectual property, dispute adjudication, data exchanges, in countries that foster an ecosystem based on rigorous data management rights.
As C J Cherry wrote, “Trade isn’t about goods. Trade is about information. Goods sit in the warehouse until information moves them.” The Economist, and this article, believe that we can start 2026 on a note of 'sloptimism'. "If the news ecosystem is sodden with slop, trust in established organisations might rebound. (Research has found that, after being asked to distinguish AI photographs from real ones, test subjects show a greater willingness to pay for a respectable newspaper.) If social-media sites become congested with slop, either those platforms will have to get serious about content moderation or else their users will shut them off."
Imagine a country with good data defences and cybersecurity. A country with good data rule of law, arbitration, mediation, expert determination, conciliation, creating an overall fair environment where legal redress is pragmatic, predictable, and cost-effective. A country with a strong tradition of open, competitive markets and free trade. A country with strong data management infrastructure and depth in skills such as law, statistics, visualisation, encryption, and turning data into information. Let’s be sloptimistic and get building those countries.