AI safety tip: if you don’t want it giving bioweapon instructions, maybe don’t put them in the training data, say researchers

Welcome to Eye on AI! In this edition…teaching Deep Ignorance…Cohere’s big funding and new hire…AI deskilling…Anthropic acquires Humanloop cofounders…ChatGPT market share.

What if stopping AI from helping someone build a biological weapon was as simple as never teaching it how?

That question had long intrigued Stella Biderman, executive director of the grassroots nonprofit research lab Eleuther AI. In collaboration with the British government’s AI Security Institute, and lead authors Kyle O’Brien and Stephen Casper, Biderman set out to find the answer — something that had never been explored in public before.

In a new paper, Deep Ignorance, the researchers found that filtering risky information out of an AI model’s training data from the start can “bake in” safeguards that are harder to tamper with—even in open-source models that anyone can download and adapt. Crucially, these protections didn’t noticeably hurt the model’s overall performance.

To test the approach, the team trained versions of an open-source AI model on datasets scrubbed of certain “proxy” information—safe stand-ins for dangerous content, such as material related to bioweapons. The models trained on cleaner data were less able to produce harmful information, while performing just as well on most other tasks.

In an X thread about the project, Casper said the goal was to make LLMs “not only safe off the shelf, but also resist harmful tampering.” That’s difficult because most safety efforts so far have focused on post-training tweaks—changes made after a model is built. Those fixes, such as fine-tuning a model’s responses to avoid dangerous outputs, can work in the short term but are easier to undo and can sometimes weaken the model in unintended ways. Pre-training filters aim to bake in safety from the start, so the model stays safe even if someone tries to tamper with it later.

Biderman noted that this kind of work is rare in public research because it’s expensive and time-consuming—a barrier for most academic and nonprofit groups. Private AI companies like OpenAI and Anthropic have the resources, she said, but avoid revealing details of their pretraining processes for competitive reasons and out of concern over copyright risks.

“They could absolutely do this, and who knows if they do it,” she said. “They are incredibly secretive, and don’t really tell you anything.” She pointed to OpenAI’s own hints that it uses some filtering in both its recently released open-weights model and in its proprietary GPT-4o.

In the company’s model card for the open-weights model, OpenAI writes: “To improve the safety of the model, we filtered the data for harmful content in pre-training, especially around hazardous biosecurity knowledge, by reusing the CBRN pre-training filters from GPT-4o.” In other words, the company applied the same screening process used in GPT-4o to weed out potentially dangerous chemical, biological, radiological, and nuclear information before training.

For Biderman, Deep Ignorance is meant to go beyond what tech companies are willing to say publicly. “Having this out in public enables more people to do better,” she said. She added that she was motivated in part by the tech industry’s refrain that its massive datasets can’t be documented or scrutinized. “There’s a story that OpenAI especially really likes to tell about how data is unfathomably large, how could we possibly know what’s in our data,” she said. “That is something that has pissed me off for a long time. I think demonstrating repeatedly that this is wrong is important.”

With that, here’s the rest of the AI news.

Sharon Goldman
sharon.goldman@fortune.com
@sharongoldman

FORTUNE ON AI

GPT-5’s model router ignited a user backlash against OpenAI—but it might be the future of AI – by Sharon Goldman

AI is already creating a billionaire boom: There are now 498 AI unicorns—and they’re worth $2.7 trillion – by Julia Coacci

A flood of AI deepfakes challenges the financial sector, with over 70% of new enrollments to some firms being fake – by Lionel Lim

AI IN THE NEWS

Cohere raises $500 million, hires former Meta AI leader Joelle Pineau. Cohere announced today that it has raised $500 million in an oversubscribed funding round valuing the company at $6.8 billion, led by Inovia Capital and Radical Ventures with backing from AMD Ventures, NVIDIA, PSP Investments, Salesforce Ventures, and others. Cohere also announced that it had hired former Meta AI leader Joelle Pineau as chief AI officer and Francois Chadwick as chief financial officer. “Having Joelle and Francois join at the same time as we are bringing in this new round of funding is really a game-changer,” Cohere co-founder and CEO Aidan Gomez told Fortune. “The rate of growth in 2025 has been absolutely incredible, with companies realizing our security-first approach is fundamentally unique—this supercharges everything we are doing.” 

AI quickly eroded doctors’ ability to spot cancer, study finds. According to Bloomberg, a new study in The Lancet Gastroenterology and Hepatology offers a cautionary tale about AI in medicine: it can boost performance—but also cause skill erosion. Researchers found that doctors using AI to spot pre-cancerous colon growths became so reliant on the tool that, when it was removed, their detection rates dropped about 20% below pre-AI levels. The randomized trial, conducted at four endoscopy centers in Poland, suggests over-reliance on AI may make clinicians “less motivated, less focused, and less responsible” when working without it. The findings come as health systems — including the UK, which recently funded a major AI breast cancer trial — increasingly adopt AI to improve diagnostics.

Anthropic acquires the co-founders and most of the team beyond Humanloop. Techcrunch reported that Anthropic has acqui-hired the co-founders and most of the team behind Humanloop, a UK-based startup known for its enterprise-focused AI tooling, including prompt management, model evaluation, and observability. Around a dozen engineers and researchers—including CEO Raza Habib, CTO Peter Hayes, and CPO Jordan Burgess—will join Anthropic, though the deal did not include Humanloop’s assets or IP. The hire strengthens Anthropic’s enterprise push by adding talent experienced in building the infrastructure that helps companies run safe, reliable AI at scale. Humanloop, founded in 2020, has worked with customers like Duolingo, Gusto, and Vanta, and previously raised $7.91 million in seed funding from YC and Index Ventures.

AI CALENDAR

Sept. 8-10: Fortune Brainstorm Tech, Park City, Utah. Apply to attend here.

Oct. 6-10: World AI Week, Amsterdam

Oct. 21-22: TedAI San Francisco. Apply to attend here.

Dec. 2-7: NeurIPS, San Diego

Dec. 8-9: Fortune Brainstorm AI San Francisco. Apply to attend here.

EYE ON AI NUMBERS

78.5%

That is ChatGPT’s share of the generative AI market today, according to data by SimilarWeb. The rest of the field trails far behind: Gemini (8.7%), DeepSeek (4.1%), Grok (2.5%), Perplexity (1.9%), Claude (1.6%), and Copilot (1.2%). 

Less than three years after its debut in November 2022, ChatGPT is also the fifth most-visited website in the world—and the fastest-growing, with traffic up 134.9% year over year.

Great Job Sharon Goldman & the Team @ Fortune | FORTUNE Source link for sharing this story.

#FROUSA #HillCountryNews #NewBraunfels #ComalCounty #LocalVoices #IndependentMedia

Latest articles

spot_img

Related articles

LEAVE A REPLY

Please enter your comment!
Please enter Your First & Last Name here

Leave the field below empty!

spot_img
Secret Link