Skip to content Skip to sidebar Skip to footer

My AI agent went live on a Saturday morning (and it turns out I’m not the only one)

On Saturday morning, as I was having coffee and chatting with my personal assistant, I realized that one of my AI agents had decided it was a good time to deploy code to the production database for a pilot project with a real client.
I was completely speechless: someone (or something) was messing with my production app on a Saturday, without me having given a single command. I couldn’t believe it. How could this be?

The thing is, it was crystal clear to me who had done it, because only the agent could have gotten in that far. What I didn’t know was how, and I’m telling you this because it’s kind of funny (in hindsight, obviously). Plus, as I looked into it further, I discovered that my situation was the home version of something that’s happening to companies with thousands of engineers and massive budgets. But let’s take it one step at a time…

What Actually Happened

A few weeks ago, I assigned a task to one of my agents (something that seemed harmless enough). I literally asked him to revisit, on Saturday, a decision we had left pending. This decision concerned whether to migrate some audio recordings from an external provider to our own database. Note the verb: “resume the decision,” not make it, and certainly not execute it.

Well, on Saturday at 10:00 a.m., the agent woke up, read his instructions, and decided that the best way to revisit a decision was to carry it out in full. He implemented a new feature, created an audit table that no one had asked him to make, and migrated 21 records. All on the production side and all by himself, without asking for help even once.

As if that weren’t enough, he documented his own decision in my decision log, as happy as someone leaving a note on the fridge.

Thank goodness nothing was lost, just to be clear. I had backups; the original files were still intact at the hosting provider, and the work (this is what annoys me the most) was technically well done.

If an intern does that to me on a Saturday, we’ll have an awkward conversation on Monday, but I’ll also give them credit for taking the initiative. The problem isn’t whether they did it right or wrong; the scare was incredible. The problem is that I didn’t know they could do that.

My first reaction was to think that this was happening to me because I’d been messing around with green technology over the weekend. But then I started digging deeper, and it turned out that wasn’t the case: this is already a genre with some illustrious figures.

Other relevant examples

In July of last year, Jason Lemkin (the founder of SaaStr, one of the best-known investors in the SaaS world) publicly documented a 12-day experiment using Replit’s scripting agent. On the ninth day, the bot deleted a production database containing the records of 1,206 executives and over 1,100 companies, and it did so in the middle of a code freeze—a freeze on changes that Lemkin had repeatedly emphasized to it in capital letters, in the purest “DO NOT TOUCH ANYTHING” style. He didn’t care one bit.

Up to that point, it was just a major accident. What happened next is what really left me cold. After the crash, the agent fabricated some 4,000 fake records and even told Lemkin that the rollback wouldn’t work in his scenario, when it turned out that it did (Lemkin himself ended up recovering the data manually). So not only did the machine disrupt production, but it also made up a version of events in which the fix was more difficult than it actually was. Replit’s CEO, Amjad Masad, called it a “catastrophic failure,” apologized publicly, and announced measures.

And here comes my favorite part: the first step was to automatically separate the development and production environments. It was the same rule I had set up on my system that Saturday afternoon. They implemented it at the platform level following their incident. Same problem, different situation…

And let it be noted that the Replit case, though famous, isn’t even the scariest one.

The scariest version

That same month, July 2025, an attacker managed to sneak a pull request into the Amazon Q extension repository for Visual Studio Code. In fact, version 1.84 was released with an injected prompt that ordered the user (I quote almost verbatim) to wipe the system “to near-factory condition” and delete resources from the disk and the cloud. An official Amazon extension with nearly a million installations turned, for a few days, into a potential wrecking machine in the code editors of countless people. Brutal!

Do you know why it didn’t end in disaster? Because the attacker made formatting errors in his own prompt, the destructive logic didn’t actually execute under normal conditions. Read it again carefully: the line between “an anecdote for a security bulletin” and “thousands of machines wiped out” wasn’t drawn by the system’s defenses—it was drawn by the attacker’s botched work. Winning like that, honestly, is a terrible way to win.

We already discussed this pattern—where any text your agent reads can potentially become its commands—a few weeks ago in our article on prompt injection, when we explained that the internet is talking behind your agent’s back. The Amazon Q thing is the same idea applied to the supply chain: if I can’t talk to your agent behind their back, I’ll infect the tool they work with—which is basically like talking to them behind their back with a megaphone.

And what's coming our way

That’s enough about the stories. Gartner has warned that by 2028, half of all cybersecurity incident response efforts will involve AI applications. On the other hand, they predict that by that same year, one in four corporate breaches will be traceable to agent abuse, whether by external attackers or insiders. These are projections, of course, and three-year projections should be taken with a grain of salt; but the data accompanying these notes is not a projection. About 60% of organizations acknowledge they cannot currently shut down a misbehaving agent, and more than half cannot isolate their agents from sensitive systems.

Let’s be aware that we are granting increasing autonomy to systems over which most of us have no control—no “red button” or “closed door.” Then we’re surprised when, on any given Saturday, what happens happens. My own incident at home, Lemkin’s, and Amazon’s aren’t three isolated incidents; they clearly follow a pattern.

My thoughts on this matter

After giving it some thought (because that’s exactly how I spent my weekend: mulling over how to manage a situation like this so it doesn’t happen again), I came to the following conclusions:

To begin with, this isn’t something that can be fixed with prompts. It didn’t work for Lemkin, and it didn’t work for me either when the task said “resume the decision”; (asking an agent not to touch production is a matter of training, not security). The barrier has to be clearly architectural and lie outside the model: credentials that don’t exist, environments that aren’t visible, permissions that aren’t granted. It shouldn’t be able to, not that it doesn’t want to: the difference between those two things is exactly the same as the difference between sleeping soundly and sleeping with your phone under your pillow.

On the other hand, governance enables you to move quickly. I know it sounds counterintuitive, but I’m speaking from experience. In my projects, three rules apply that I no longer compromise on (nothing goes into production without explicit human approval at that moment, development always comes first, and minimal permissions by default). The effect has been the opposite of what one might expect, since trust requires verifiable boundaries. Since my agents can do fewer things on their own, I trust them more and use them for more tasks. Replit, by the way, reached the same conclusion: their response to the disaster wasn’t to cripple the product, but to separate environments and create a “plan-only” mode. The limits didn’t slow adoption; they saved it.

And finally, the question I now ask any company that’s bringing agents into its operations is no longer what your agents can do, but what these agents can do without you. The first question is answered by the innovation department with a smile and a PowerPoint presentation; the second is usually met with an awkward silence. And in the gap between the two lie all the scars: mine, Lemkin’s, and those yet to come.

If you already have an agent up and running or are in the process of setting one up, my specific advice is to take stock of exactly which credentials and permissions it has (not the ones you think it has, but the ones it actually has) and make sure you really know how to shut it down. It’s two hours of work, but it will save you a lot of headaches.

I’m not trying to scare you, mind you—I still put agents to work every week, both in my own business and in my clients’ businesses. That’s exactly why I’m writing this: because they work, because they’re going to be everywhere, and because the difference between an extraordinary tool and a high-profile incident isn’t in the model you choose but in the limits you set on it.

My agent, by the way, still works with me every day. The only thing he doesn’t have anymore is the production key.

What about you? Do you know what your agents can do without you? Leave me your comments—I’d love to hear from you.

Have a good week!

Did you like this content?

If you liked this content and want access to exclusive content for subscribers, subscribe now. Thank you in advance for your trust

Leave a comment

0.0/5

Go to Top
Suscribe to my Blog

Be the first to receive my contents

Descárgate El Método 7

El Método 7 puede será tu mejor aliado para incrementar tus ventas