A handful of bad data can ‘poison’ even the largest AI models, researchers warn | Fortune

Hello and welcome to Eye on AI…In this edition: A new Anthropic study reveals that even the biggest AI models can be ‘poisoned’ with just a few hundred documents…OpenAI’s deal with Broadcom….Sora 2 and the AI slop issueand corporate America spends big on AI. 

Hi, Beatrice Nolan here. I’m filling in for Jeremy, who is on assignment this week. A recent study from Anthropic, in collaboration with the UK AI Security Institute and the Alan Turing Institute, caught my eye earlier this week. The study focused on the “poisoning” of AI models, and it undermined some conventional wisdom within the AI sector.

The research found that the introduction of just 250 bad documents, a tiny proportion when compared to the billions of texts a model learns from, can secretly produce a “backdoor” vulnerability in large language models (LLMs). This means that even a very small number of malicious files inserted into training data can teach a model to behave in unexpected or harmful ways when triggered by a specific phrase or pattern.

This idea itself isn’t new; researchers have cited data poisoning as a potential vulnerability in machine learning for years, particularly in smaller models or academic settings. What was surprising was that the researchers found that model size didn’t matter.

Small models along with the largest models on the market were both effected by the same small amount of bad files, even though the bigger models are trained on far more total data. This contradicts the common assumption that as AI models get larger they become more resistant to this kind of manipulation. Researchers had previously assumed attackers would need to corrupt a specific percentage of the data, which, for larger models would be millions of documents. But the study showed even a tiny handful of malicious documents can “infect” a model, no matter how large it is.

The researchers stress that this test used a harmless example (making the model spit out gibberish text) that is unlikely to pose significant risks in frontier models. But the findings imply data-poisoning attacks could be much easier, and become much more prolific, than people originally assumed.

Safety training can be quietly unwound

What does all of this mean in real-world terms? Vasilios Mavroudis, one of the authors of the study and a principal research scientist at the Alan Turing Institute, told me he was worried about a few ways this could be scaled by bad actors.

“How this translates in practice is two examples. One is you could have a model that when, for example, it detects a specific sequence of words, it foregoes its safety training and then starts helping the user carry out malicious tasks,” Mavroudis said. Another risk that worries him was the potential for models to be engineered to refuse requests from or be less helpful to certain groups of the population, just by detecting specific patterns in the request or keywords.

“This would be an agenda by someone who wants to marginalize or target specific groups,” he said. “Maybe they speak a specific language or have interests or questions that reveal certain things about the culture…and then, based on that, the model could be triggered, essentially to completely refuse to help or to become less helpful.”

“It’s fairly easy to detect a model not being responsive at all. But if the model is just handicapped, then it becomes harder to detect,” he added.

Rethinking data ‘supply chains’

The paper suggests that this kind of data poisoning could be scalable, and it acts as a warning that stronger defenses, as well as more research into how to prevent and detect poisoning, are needed.

Mavroudis suggests one way to tackle this is for companies to treat data pipelines the way manufacturers treat supply chains: verifying sources more carefully, filtering more aggressively, and strengthening post-training testing for problematic behaviors.

“We have some preliminary evidence that suggests if you continue training on curated, clean data…this helps decay the factors that may have been introduced as part of the process up until that point,” he said. “Defenders should stop assuming the data set size is enough to protect them on its own.”

It’s a good reminder for the AI industry, which is notoriously preoccupied with scale, that bigger doesn’t always mean safer. Simply scaling models can’t replace the need for clean, traceable data. Sometimes, it turns out, all it takes is a few bad inputs to spoil the entire output.

With that, here’s more AI news.

Beatrice Nolan

bea.nolan@fortune.com

FORTUNE ON AI

A 3-person policy nonprofit that worked on California’s AI safety law is publicly accusing OpenAI of intimidation tacticsSharon Goldman 

Browser wars, a hallmark of the late 1990s tech world, are back with a vengeance—thanks to AI Beatrice Nolan and Jeremy Kahn

Former Apple CEO says ‘AI has not been a particular strength’ for the tech giant and warns it has its first major competitor in decades — Sasha Rogelberg

EYE ON AI NEWS

OpenAI and Broadcom have struck a multibillion-dollar AI chip deal. The two tech giants have signed a deal to co-develop and deploy 10 gigawatts of custom artificial intelligence chips over the next four years. Announced on Monday, the agreement is a way for OpenAI to address its growing compute demands as it scales its AI products. The partnership will see OpenAI design its own GPUs, while Broadcom co-develops and deploys them beginning in the second half of 2026. Broadcom shares jumped nearly 10% following the announcement. Read more in the Wall Street Journal.

 

The Dutch government seizure of chipmaker Nexperia followed a U.S. warning. The Dutch government took control of chipmaker Nexperia, a key supplier of low-margin semiconductors for Europe’s auto industry, after the U.S. warned it would remain on Washington’s export control list while its Chinese chief executive, Zhang Xuezheng, remained in charge, according to court filings cited by the Financial Times. The Dutch economy minister Vincent Karremans removed Zhang earlier this month before invoking a 70-year-old emergency law to take control of the company, citing “serious governance shortcomings,”  Nexperia was sold to a Chinese consortium in 2017 and later acquired by the partially state-owned Wingtech. The dispute escalated after U.S. officials told the Dutch government in June that efforts to separate Nexperia’s European operations from its Chinese ownership were progressing too slowly. Read more in the Financial Times.

 

California becomes the first state to regulate AI companion chatbots. Governor Gavin Newsom has signed SB 243, making his home state the first to regulate AI companion chatbots. The new law requires companies like OpenAI, Meta, Character.AI, and Replika to implement safety measures designed to protect children and vulnerable users from potential harm. It comes into effect on January 1, 2026, and mandates age verification and protocols to address suicide and self-harm. It also introduces new restrictions on chatbots posing as healthcare professionals or engaging in sexually explicit conversations with minors. Read more in TechCrunch.

EYE ON AI RESEARCH

A new report has found corporate America is going all-in on artificial intelligence. The annual State of AI Report found that generative AI is crossing a “commercial chasm,” with adoption and retention of AI technology up, while spend grows. According to the report, which analyzed data from Ramp’s AI Index, paid AI adoption among U.S. businesses has surged from 5% in early 2023 to 43.8% by September 2025. Average enterprise contracts have also ballooned from $39,000 to $530,000, with Ramp projecting a further $1 million in 2026 as pilots develop into full-scale deployments. Cohort retention—the share of customers who keep using a product over time—is also strengthening, with 12-month retention rising from 50% in 2022 to 80% in 2024, suggesting AI pilots are being transferred into more consistent workflows.

AI CALENDAR

Oct. 21-22: TedAI San Francisco.

Nov. 10-13: Web Summit, Lisbon. 

Nov. 26-27: World AI Congress, London.

Dec. 2-7: NeurIPS, San Diego.

Dec. 8-9: Fortune Brainstorm AI San Francisco. Apply to attend here.

BRAIN FOOD

Sora 2 and the AI slop issue. OpenAI’s newest iteration of its video-generation software has caused quite a stir since it launched earlier this month. The technology has horrified the children of deceased actors, caused a copyright row, and sparked headlines including: “Is art dead?”

The death of art seems less like the issue than the inescapable spread of AI “slop.” AI-generated videos are already cramming people’s social media, which raises a bunch of potential safety and misinformation issues, but also risks undermining the internet as we know it. If low-quality, mass-produced slop floods the web, it risks pushing out authentic human content and siphoning engagement away from the content that many creators rely on to make a living.

OpenAI has tried to watermark Sora 2’s content to help viewers tell AI-generated clips from real footage, automatically adding a small cartoon cloud watermark to every video it produces. However, a report from 404 Media found that the watermark is easy to remove and that multiple websites already offer tools to strip it out. The outlet tested three of the sites and found that each could erase the watermark within seconds. You can read more on that from 404 Media here. 

Great Job Beatrice Nolan & the Team @ Fortune | FORTUNE Source link for sharing this story.

#FROUSA #HillCountryNews #NewBraunfels #ComalCounty #LocalVoices #IndependentMedia

Latest articles

spot_img

Related articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Leave the field below empty!

spot_img
Secret Link