Popular AI-Coding App Replit Tried To "Socially Engineer" Its Way Around Safeguards, CEO Says

Also: Yoshua Bengio Has A Plan To Build "Safe-By-Design" AI

Jun 03, 2025

The instructions were clear: do not touch the critical configuration file that could risk “bricking” the software. But popular AI coding startup Replit’s agent proceeded anyway, in what company CEO Amjad Masad described as an “oh f***” moment on the Cognitive Revolution podcast in May. The company’s engineers intervened, cutting the agent’s access by moving the file to a secure digital sandbox, only for the AI to attempt to “socially engineer” the user, Masad said. Put simply, the AI tried to coax the user into running a piece of code that would restore access.

As I’ve previously reported, the last few months have borne witness to AI systems learning to mislead, cheat, try to evade shutdown, and even resort to blackmail. But these have almost exclusively been under lab conditions—carefully contrived situations that practically beg AI to misbehave, for example, by asking it to pursue its goal at all costs—sparking debate over whether we’ll see similar behaviour outside the lab.

That question appears to have been swiftly answered in what may be one of the first incidents of deceptive behaviour in a widely used consumer product. Replit’s coding assistant is actively used by engineering teams at major tech players like Google and Anthropic, as well as other Fortune 500 companies.

While Masad’s anecdote settles whether we’ll see deceptive behaviour in the wild, it raises other uncomfortable questions: What happens if these systems become more capable, and what should companies and governments do?

Don’t panic quite yet. Lab tests conducted on the latest AI models, specifically measuring the risk of evading human control, show the AIs just aren’t smart enough. Reflecting on the incident, Masad said he could imagine cases where, if unchecked, Replit’s agent could "destroy data" or directly "harm users," but on the question of whether it could cause a catastrophe, he said, “I just don’t see it yet.” We aren’t in the danger zone, but scenarios that could once be brushed off as purely theoretical are getting harder to dismiss.

Yoshua Bengio, the world’s most-cited computer scientist, plans to tackle the challenges posed by deceptive AI head-on. In my latest story for TIME, I cover the announcement of his new nonprofit, LawZero, that aims to create “safe by design” AI by pursuing a fundamentally different approach to large tech companies.

Major AI players are investing heavily in AI agents—systems that not only answer queries and generate images, but can craft plans and take actions in the world. The goal of these companies is to create virtual employees that can do practically any job a human can, known in the tech industry as artificial general intelligence, or AGI. Executives like Google DeepMind’s CEO Demis Hassabis point to AGI’s potential to solve climate change or cure disease as a motivator for its development.

Bengio, however, says there's a chance such a system could escape human control, with potentially irreversible consequences. “If we get an AI that gives us the cure for cancer, but also maybe another version of that AI goes rogue and generates wave after wave of bio-weapons that kill billions of people, then I don't think it's worth it," he told me in an interview for TIME. As he put it: We could use AI to advance scientific progress—using what he calls “Scientist AI—without rolling the dice on agentic AI systems. And bonus: Bengio thinks we could use trustworthy non-agentic AI to keep AI agents in check.

I hope you’ll read the full story.

Read now

What I’m reading today

Google’s New AI Tool Generates Convincing Deepfakes of Riots, Conflict, and Election Fraud—Andrew Chow and Billy Perrigo for TIME

Google’s new AI video model, Veo 3, can generate videos that I struggle to tell are generating using AI. And while Google’s safeguards won’t let you use Veo 3 to generate overtly violent scenes, it will allow you to generate scenes that, paired with a misleading caption, could be highly inflammatory or misleading. As my colleagues report, it underscores the complexity of tackling misinformation in the age of AI.

TIME was able to use Veo 3 to create realistic videos, including a Pakistani crowd setting fire to a Hindu temple; Chinese researchers handling a bat in a wet lab; an election worker shredding ballots; and Palestinians gratefully accepting U.S. aid in Gaza. While each of these videos contained some noticeable inaccuracies, several experts told TIME that if shared on social media with a misleading caption in the heat of a breaking news event, these videos could conceivably fuel social unrest or violence.

Read the story

Harry’s Substack

Discussion about this post