Just want to clarify, this is not my Substack, I’m just sharing this because I found it insightful.
The author describes himself as a “fractional CTO”(no clue what that means, don’t ask me) and advisor. His clients asked him how they could leverage AI. He decided to experience it for himself. From the author(emphasis mine):
I forced myself to use Claude Code exclusively to build a product. Three months. Not a single line of code written by me. I wanted to experience what my clients were considering—100% AI adoption. I needed to know firsthand why that 95% failure rate exists.
I got the product launched. It worked. I was proud of what I’d created. Then came the moment that validated every concern in that MIT study: I needed to make a small change and realized I wasn’t confident I could do it. My own product, built under my direction, and I’d lost confidence in my ability to modify it.
Now when clients ask me about AI adoption, I can tell them exactly what 100% looks like: it looks like failure. Not immediate failure—that’s the trap. Initial metrics look great. You ship faster. You feel productive. Then three months later, you realize nobody actually understands what you’ve built.



So if it can be vibe coded, it’s pretty much certainly already a “thing”, but with some awkwardness.
Maybe what you need is a combination of two utilities, maybe the interface is very awkward for your use case, maybe you have to make a tiny compromise because it doesn’t quite match.
Maybe you want a little utility to do stuff with media. Now you could navigate your way through ffmpeg and mkvextract, which together handles what you want, with some scripting to keep you from having to remember the specific way to do things in the myriad of stuff those utilities do. An LLM could probably knock that script out for you quickly without having to delve too deeply into the documentation for the projects.
I’ll put it this way: LLMs have been getting pretty good at translation over the past 20 years. Sure, human translators still look down their noses at “automated translations” but, in the real world, an automated translation gets the job done well enough most of the time.
LLMs are also pretty good at translating code, say from C++ to Rust. Not million line code bases, but the little concepts they can do pretty well.
On a completely different tack, I’ve been pretty happy with LLM generated parsers. Like: I’ve got 1000 log files here, and I want to know how many times these lines appear. You’ve got grep for that. But, write me a utility that finds all occurrences of these lines, reads the time stamps, and then searches for any occurrences of these other lines within +/- 1 minute of the first ones… grep can’t really do that, but a 5 minute vibe coded parser can.
If I understand correctly then this means mostly adapting the interface?
It’s certainly a use case that LLM has a decent shot at.
Of course, having said that I gave it a spin with Gemini 3 and it just hallucinated a bunch of crap that doesn’t exist instead of properly identifying capable libraries or frontending media tools…
But in principle and upon occasion it can take care of little convenience utilities/functions like that. I continue to have no idea though why some people seem to claim to be able to ‘vibe code’ up anything of significance, even as I thought I was giving it an easy hit it completely screwed it up…
I tried using Gemini 3 for OpenSCAD, and it couldn’t slice a solid properly to save its life, I gave up on it after about 6 attempts to put a 3:12 slope shed roof on four walls. Same job in Opus 4.5 and I’ve got a very nicely styled 600 square foot floor plan with radiused 3D concrete printed walls, windows, doors, shed roof with 1’ overhang, and a python script that translates the .scad to a good looking .svg 2D floorplan.
I’m sure Gemini 3 is good for other things, but Opus 4.5 makes it look infantile in 3D modeling.
Having used both Gemini and Claude… I use Gemini when I need to quickly find something I don’t want to waste time searching for, or I need a recipe found and then modified to fit what I have on hand.
Everytime I used Gemini for coding has ended in failure. It constantly forgets things, forgets what version of a package you’re using so it tells you to do something that is deprecated, it was hell. I had to hold its hand the entire time and talk to it like it’s a stupid child.
Claude just works. I use Claude for so many things both chat and API. I didn’t care for AI until I tried Claude. There’s a whole whack of novels by a Russian author I like but they stopped translating the series. Claude vibe coded an app to read the Russian ebooks, translate them by chapter in a way that prevented context bleed. I can read any book in any language for about $2.50 in API tokens.
I’ve been using Claude to mediocre results, so this time I used Gemini 3 because everyone in my company is screaming “this time it works, trust us bro”. Claude has not been working so great for me for my day job either.
I think it really depends on the user and how you communicate with the AI. People are different, and we communicate differently. But if you’re precise and you tell it what you want, and what your expected result should be it’s pretty good at filling in the blanks.
I can pull really useful code out of Claude, but ask me to think up a prompt to feed into Gemini for video creation and they look like shit.
The type of problem in my experience is the biggest source of different results
Ask for something that is consistent with very well trodden territory, and it has a good shot. However if you go off the beaten path, and it really can’t credibly generate code, it generates anyway, making up function names, file paths, rest urls and attributes, and whatever else that would sound good and consistent with the prompt, but no connection to real stuff.
It’s usually not that that it does the wrong thing because it “misunderstood”, it is usually that it producea very appropriate looking code consistent with the request that does not have a link to reality, and there’s no recognition of when it invented non existent thing.
If it’s a fairly milquetoast web UI manipulating a SQL backend, it tends to chew through that more reasonably (though in various results that I’ve tried it screwed up a fundamental security principle, like once I saw it suggest a weird custom certificate validation and disable default validation while transmitting sensitive data before trying to meaningfully execute the custom valiidation.