Whether it's vibe coding, agentic coding, or copy pasting from the web interface to your editor, it's still sad to see the normalization of private (i.e., paid) LLM models. I like the progress that LLMs introduce and I see them as a powerful tool, but I cannot understand how programmers (whether complete nobodies or popular figures) dont mind adding a strong dependency on a third party in order to keep programming. Programming used to be (and still is, to a large extent) an activity that can be done with open and free tools. I am afraid that in a few years, that will no longer be possible (as in most programmers will be so tied to a paid LLM, that not using them would be like not using an IDE or vim nowadays), since everyone is using private LLMs. The excuse "but you earn six figures, what' $200/month to you?" doesn't really capture the issue here.
> The excuse "but you earn six figures, what' $200/month to you?" doesn't really capture the issue here.
Just like every other subscription model, including the one in the Black Mirror episode, Common People. The value is too good to be true for the price at the beginning. But you become their prisoner in the long run, with increasing prices and degrading quality.
I don't think it's subscriptions so much as consumer startup pricing strategies:
Netflix/Hulu were "losing money on streaming"-level cheap.
Uber was "losing money on rides"-level cheap.
WeWork was "losing money on real-estate" level cheap.
Until someone releases wildly profitable LLM company financials it's reasonable to expect prices to go up in the future.
Course, advances in compute are much more reasonable to expect than advances in cheap media production, taxi driver availability, or office space. So there's a possibility it could be different. But that might require capabilities to hit a hard plateau so that the compute can keep up. And that might make it hard to justify the valuations some of these companies have... which could also lead to price hikes.
But I'm not as worried as others. None of these have lock-in. If the prices go up, I'm happy to cancel or stop using it.
For a current student or new grad who has only ever used the LLM tools, this could be a rougher transition...
Another thing that would change the calculation is if it becomes impossible to maintain large production-level systems competitively without these tools. That's presumably one of the things the companies are betting on. We'll see if they get there. At that point many of us probably have far bigger things to worry about.
There is a reason why companies throw billions into AI and still are not profitable. They must be the first ones to hook the users in the long run and make service necessary part of user’s life. And then increase the price.
The argument is perhaps ”enshittification”, and that becoming reliant on a specific provider or even set of providers for ”important thing” will become problematic over time.
> Programming used to be (and still is, to a large extent) an activity that can be done with open and free tools.
Yet JetBrains has been a business longer than some of my colleagues have been alive, and Microsoft’s Visual Basic/C++/Studio made writing software for Windows much easier, and did not come cheap.
I see a big difference: I do use Jetbrains IDEs (they are nice), but I can switch to vim (or vscode) any time if I need to (e.g., let's say Jetbrains increase their price to a point that doesn't make sense, or perhaps they introduce a pervasive feature that cannot be disabled). The problem with paid LLMs is that one cannot easily switch to open-source ones (because they are not as good as the paid ones). So, it's a dependency that cannot be avoided, and that's imho something that shouldn't be overlooked.
I don't contribute to vim precisely, but I do contribute to other open source projects. So, I do like to keep this balance between making open source tools better over time and using paid alternatives. I don't think that's possible tho with LLMs at the moment (and I dont think it would be possible in the future, but ofc i could be wrong).
I was a hardcore vim user 10 years ago, but now I just use PyCharm to work. I'm paid to solve problems, not to futz around with vim configs.
Can you make vim work roughly the same way? Probably you can get pretty close. But how many hours do I have to sink into the config? A lot. And suddenly the PyCharm license is cheap.
And it's exactly the same thing with LLMs. You want hand crafted beautiful code, untainted by AI? You can still do that. But I'm paid to solve problems. I can solve them faster/solve more of them? I get more money.
> I was a hardcore vim user 10 years ago, but now I just use PyCharm to work. I'm paid to solve problems, not to futz around with vim configs.
The reason I don't like those arguments is that they merge two orthogonal stuff: Solving problems and optimizing your tooling. You can optimize PyCharm just as much you can fiddle with Vim's config. And people are solving with problems with Vim just as you do with an IDE. It's just a matter of preference.
In my day job, I have two IDEs, VSCode, and Emacs open. I prefer Emacs to edit and git usage, but there's a few things that only the IDEs can do (as in I don't bother setting emacs to do the same), and VSCode is there because people get dizzy with the way I switch buffers in Emacs.
Open-weight and open-source LLMs are improving as well. While there will likely always be a gap between closed, proprietary models and open models, at the current pace the capabilities of open models could match today’s closed models within months.
I don't think so. Let's do a silly experiment: antirez, could you ditch Gemini 2.5 PRO and Claude Opus 4, and instead use llama? Like never again go back to Gemini/Claude. I don't think he can (I don't think he would want to). I this is not on antirez, this is on everyone who's paying for LLMs at the moment: they are paying for them because they are so damn good compared to the open source ones... so there's no incentive to switch. But again, that's like the climate change: there's no incentive to pollute less (well, perhaps to save us, but money is more important).
Ah, there are community editions of the most imiportant tools (since 10+ years), and i doubt e.g. MS will close VS.NET Community Version in the future.
If the models are getting cheaper, better, and freer even when we use paid ones, then right now is the time to develop techniques, user interfaces, and workflows that become the inspirations and foundations of a future world of small, local, and phenomenally powerful models that have online learning, that can formalize their reason, that can bake deduction into their own weights and code.
> Programming used to be (and still is, to a large extent) an activity that can be done with open and free tools. I am afraid that in a few years, that will no longer be possible .. The excuse "but you earn six figures, what' $200/month to you?" doesn't really capture the issue here.
One aspect of this that's really sad to think about is, coding (and to a lesser extent IT in general) at one point was a real meritocracy, where skill mattered more than expensive/unnecessary academic pedigree. Not perfect of course, but real nevertheless. And coders were the first engineers who really said "I won't be renting a suit for an interview, I think an old t-shirt is fine" and we normalized that. Part of this was just uncompromisingly practical.. like you can either do the work or not, and fuck the rest of that noise. But there was also a pretty punk aspect to this for many people in the industry.. some recognition that needing to have money to make money was a bullshit relic of closeted classism.
But we're fast approaching a time where both the old metrics (how much quality code are you writing how fast and what's your personal open source portfolio like?) and the new metrics (are you writing a blog post every week about your experience with the new models, is your personal computer fast enough to even try to run crappy local models?) are both going to favor those with plenty of money to experiment.
It's not hard to see how this will make inequality worse and disadvantage junior devs, or just talented people that didn't plan other life-events around purchasing API credits/GPUs. A pay-to-play kind of world was ugly enough in politics and business so it sucks a lot to see it creeping into engineering disciplines but it seems inevitable.
Yes, and what is worse is that the same mega-corporations who have been ostensibly promoting equity until 2025 are now pushing for a gated development environment that costs the same as a monthly rent in some countries or more than a monthly salary in others.
That problem does not even include lock-in, surveillance, IP theft and all other things that come with SaaS.
It’s not that bad: K2 and DeepSeek R1 are at the level of frontier models of one year ago (K2 may be even better: I have enough experience only with V3/R1). We will see more coming since LLMs are incredibly costly to train but very simple in their essence (it’s like if their fundamental mechanic is built in the physical nature of the computation itself) so the barrier to entry is large but not insurmountable.
Ad-free search doesn't by itself produce a unique product. It's just a product that doesn't have noise, noise that people with attention spans and focus don't experience at all.
Local models are not quite there yet. For now, use the evil bad tools to prepare for the good free tools when they do get there. It's a self-correcting form of technical debt that we will never have to pay down.
Of course they are. I wouldn't expect otherwise :)
But the price we're paying (and I don't mean money) is very high, imho. We all talk about how good engineers write code that depends on high-level abstractions instead of low-level details, allowing us to replace third party dependencies easily and test our apps more effectively, keeping the core of our domain "pure". Well, isn't it time we started doing the same with LLMs? I'm not talking about MCP, but rather an open source tool that can plug into either free and open source LLMs or private ones. That would at least allow us to switch to a free and opensource version if the companies behind the private LLMs go rogue. I'm afraid tho that wouldn't be enough, but it's a starting point.
To put an example: what would you think if you need to pay for every single Linux process in your machine? Or for every Git commit you make? Or for every debugging session you perform?
> an open source tool that can plug into either free and open source LLMs or private ones
Fortunately there are many of these that can integrate with offline LLMs through systems like LiteLLM/Ollama/etc. Off the top of my head, I'd look into Continue, Cline and Aider.
> I'm not talking about MCP, but rather an open source tool that can plug into either free and open source LLMs or private ones. That would at least allow us to switch to a free and opensource version if the companies behind the private LLMs go rogue. I'm afraid tho that wouldn't be enough, but it's a starting point.
There are open source tools that do exactly that already.
Ah, well that's nice. But every single post I read don't mention them? So, I assume they are not popular for some reason. Again, my main point here is: the normalization of using private LLMs. I don't see anyone talking about it; we are all just handing over a huge part of what it means to build software to a couple of enterprises whose goal is, of course, to maximize profit. So, yeah, perhaps I'm overthinking I don't know; I just don't like that now these companies are so ingrained in the act of building software (just like AWS is so ingrained in the act of running software)
Because the models are so much worse that people aren't using them.
Philosophical battles don't pay the bills and for most of us they aren't fun.
There have been periods of my life where I stubbornly persisted using something inferior for various reasons - maybe I was passionate about it, maybe I wanted it to exist and was willing to spend my time debugging and offer feedback - but there a finite number of hours in my life and often I'd much rather pay for something that works well than throw my heart, soul, time, and blood pressure at something that will only give me pain.
> I'm not talking about MCP, but rather an open source tool that can plug into either free and open source LLMs or private ones.
Has someone computed/estimated what is at cost $$$ value of utilizing these models at full tilt: several messages per minute and at least 500,000 token context windows? What we need is a wikipedia like effort to support something truly open and continually improving in its quality.
None of that applies here since we could all easily switch to open models at a moment's notice with limited costs. In fact, we switch between proprietary models every few months.
It just so happens that closed models are better today.
I personally can’t wait for programming to ‘die’. It has stolen a decade of my life minimum. Like veterinarians being trained to help pets ultimately finding out a huge portion of the job is killing them. I was not sufficiently informed that I’d spend a decade arguing languages, dealing with thousands of other developers with diverging opinions, legacy code, poorly if at all maintained libraries, tools, frameworks, etc if you have been in the game at least a decade please don’t @. Adios to programming as it was (happily welcoming a new DIFFERENT reality whatever that means). Nostalgia is for life, not staring at a screen 8hrs a day
Feels like this is a byproduct of a poor work-life balance more than an intrinsic issue with programming itself. I also can't really relate since I've always enjoyed discussing challenging problems with colleagues.
I'm assuming by "die" you mean some future where autonomous agentic models handle all the work. In this world, where you can delete your entire programming staff and have a single PM who tells the models what features to implement next, where do you imagine you fit in?
I just hope for your sake that you have a fallback set of viable skills to survive in this theoretical future.
You got some arguably rude replies to this but you're right. I've been doing this a long time and the stuff you listed is never the fun part despite some insistence on HN that it somehow is. I love programming as a platonic ideal but those moments are fleeting between the crap you described and I can't wait for it to go.
I've been programming professionally since 2012 and still love it. To me the sweet spot must've been the early mid 2000s, with good enough search engines and ample documentation online.
Did you expect computer programming not to involve this much time at a computer screen?
Most modern jobs especially in tech do. If it’s no longer fulfilling, it might be worth exploring a different role or field instead of waiting for the entire profession to change.
I understand your frustration but the problem is mostly people. Not the particular skill itself.
IMO It's not unlike all other "dev" tools we use at all. There are tons of free and open tools that usually lag a bit behind the paid versions. People pay for jetbrains, for mac os, and even to search the web (google ads).
You have very powerful open weight models, they are not the cutting edge. Even those you can't really run locally, so you'd have to pay a 3rd party to run it.
Also the competition is awesome to see, these companies are all trying hard to get customers and build the best model and driving prices down, and giving you options. No one company has all of the power, its great to see capitalism working.
LLMS are basically free? Yes you're rate limited but I have just started paying for them now, before I'd bounce around between the providers but still free
> Programming used to be (and still is, to a large extent) an activity that can be done with open and free tools.
Since when? It starts with computers, the main tool and it's architecture not being free and goes from there. Major compilers used to not be free. Major IDEs used to not be free. For most things there were decent and (sometimes) superior free alternatives. The same is true for LLMs.
> The excuse "but you earn six figures, what' $200/month to you?" doesn't really capture the issue here.
That "excuse" could exactly capture the issue. It does not, because you chose to make it a weirder issue. Just as before: You will be free to either not use LLMs, or use open-source LLMs, or use paid LLMs. Just as before in the many categories that pertain to programming. It all comes at a cost, that you might be willing to pay and somebody else is free to really does not care that much about.
> Major compilers used to not be free. Major IDEs used to not be free.
There were and are a lot of non-free ones, but since the 1990s, GCC and interpreted languages and Linux and Emacs and Eclipse and a bunch of kinda-IDEs were all free, and now VS Code is one of the highest marketshare IDEs, and those are all free. Also, the most used and learned programming language is JS, which doesn't need compilers in the first place.
The original point was that there is some inherent tradition in programming being free, with a direct critique wrt LLMs, which apparently breaks that tradition.
And my point is that's simply not the case. Different products have always been not free, and continue to be not free. Recent example would be something like Unity, that is not entirely free, but has competitors, which are entirely free and open source. JetBrain is something someone else brought up.
Again: You have local LLMs and I have every expectation they will improve. What exactly are we complaining about? That people continue to build products that are not free and, gasp, other people will pay for them, as they always have?
There's never been anything stopping you from building your own
Soon there will be. The knowledge of how to do so will be locked behind LLMs, and other sources of knowledge will be rarer and harder to find as a result of everything switching to LLM use
For the past decades knowledge was "locked" behind search engines. Could you have rolled your own search engine indexing the web, to unlock that knowledge? Yes, in the same theoretical way that you can roll your own LLM.
There was never anything stopping you from finding other avenues than Search Engines to get people to find your website. You could find a url on a board at a cafe and still find a website without a search engine. More local sure, but knowledge had ways to spread in the real world when it needed to
How are LLMs equivalent? People posting their prompts on bulletin boards at cafes?
But what is (or will be) stopping you from finding avenues other than LLMs? You say other sources of knowledge will be rarer. But they will still exist, and I don't see why they will become less accessible than non-search-engine-indexed content is now.
> Programming used to be (and still is, to a large extent) an activity that can be done with open and free tools.
Not without a lot of hard thankless work by people like RMS to write said tools. Programming for a long while was the purview of Microsoft Visual Studio family, which cost hundreds, if not thousands of dollars. There existed other options, some of which was free, but, as is the case today with LLMs you can run at home, they were often worse.
This is why making software developer tools is such a tough market and why debugging remains basically in the dark ages (though there are the occasional bright lights like rr). Good quality tools are expensive, for doctors and mechanics, why do we as software developers expect ours to be free, libre and gratis?
Why do you see this as a strong dependency? The beauty of it is that you can change the model whenever you want. You can even just code yourself! This isn't some no-code stuff.
I'm certain these are advertorials masquerading as personal opinions. These people are being paid to promote the product, either through outright cash, credits on their platform or just swag.
A lot of people are really bad at change. See: immigration. Short of giving everyone jazz improv lessons at school, there's nothing to be done.
To be fair, change is not always good. We still haven't fixed fitness/obesity issues caused (partly) by the invention of the car, 150 years later. I think there's a decent chance LLMs will have the same effect on the brain.
I'm going a little offtopic here, but I disagree with the OPs use of the term "PhD-level knowledge", although I have a huge amount of respect for antirez (beside that we are born in the same island).
This phrasing can be misleading and points to a broader misunderstanding about the nature of doctoral studies, which it has been influenced by the marketing and hype discourse surrounding AI labs.
The assertion that there is a defined "PhD-level knowledge" is pretty useless. The primary purpose of a PhD is not simply to acquire a vast amount of pre-existing knowledge, but rather to learn how to conduct research.
Agree with that. Read it as expert-level knowledge without all the other stuff LLMs can’t do as well as humans. LLMs way to express knowledge is kinda of alien as it is different, so indeed those are all poor simplifications. For instance an LLM can’t code as well as a top human coder but can write a non trivial program from the first to the last character without iterating.
What sticks out to me is Gemini catching bugs before production release, was hoping you’d give a little more insight into that.
Reason being is that we expect ai to create bugs and we catch them, but if Gemini is spotting bugs by some way of it being a QA (not just by writing and passing tests) then that perks my interest.
Further, I always assumed PhD level of knowledge meant coming up with the right questions. I would say it is at best a "Lazy Knowledge Rich worker", it won't explore hypothesis if you don't *ask it* to. A PHD would ask those questions to *themselves*. Let me give you a simple example:
The other day Claude Code(Max Pro Subscription) commented out a bunch of test assertions as a part of a related but separate test suite it was coding. It did not care to explore — what was a serious bug — why it was commenting it out because of a faulty assumption in the original plan. I had to ask it to change the plan by doing the ultra-think, think-hard trick to explore why it was failing, amend the plan and fix it.
The bug was the ORM object had null values because it was not refreshed after the commit and was fetched before by another DB session that had since been closed.*
If you understand that a PhD is about much more than just knowledge, it's still the case that having easy access to that knowledge is super valuable. My last job we often had questions that would just traditionally require a PhD-level person to answer, even if it wasn't at the limit of their research abilities. "What will happen to the interface of two materials if voltage is applied in one direction" type stuff, turns out to be really hard to answer but LLMs do a decent job.
I think all conversations about coding with LLMs, vibe coding, etc. need to note the domain and choice of programming language.
IMHO those two variables are 10x (maybe 100x) more explanatory than any vibe coding setup one can concoct.
Anyone who is befuddled by how the other person {loves, hates} using LLMs to code should ask what kind of problem they are working on and then try to tackle the same problem with AI to get a better sense for their perspective.
Until then, every one of these threads will have dozens of messages saying variations of "you're just not using it right" and "I tried and it sucks", which at this point are just noise, not signal.
They should also share their prompts and discuss exactly how much effort went into checking the output and re-prompting to get the desired result.
The post hints at how much work it takes for the human,
"If you are able to describe problems in a clear way and, if you are able to accept the back and forth needed in order to work with LLMs ... you need to provide extensive information to the LLM: papers, big parts of the target code base ... And a brain dump of all your understanding of what should be done. Such braindump must contain especially the following:" and more.
After all the effort getting to the point where the generated code is acceptable,
one has to wonder,
why not just write it yourself?
The time spent typing is trivial to all the cognitive effort involved in describing the problem,
and describing the problem in a rigorous way is the essence of programming.
I would assume the argument is that you only need to provide the braindump and extensive information one time (or at least, collect it once, if not upload once) and then you can take your bed of ease as the LLM uses that for many tasks.
The thing is no one writes that much code, at least anyone that cares about code reuse. Mostly the times is spent collecting the information (especially communication with stakeholder), and verifying that the code you wrote didn't break anything.
> After all the effort getting to the point where the generated code is acceptable, one has to wonder, why not just write it yourself?
You know, I would often ask myself that very question...
Then I discovered the stupid robots are good at designing a project, you ask them to produce a design document, argue over it with them for a while, make revision and changes, explore new ideas, then, finally, ask them to produce the code. It's like being able to interact with the yaks you're trying to shave, what's not to love about that?
I find it even more sad when people come out of the woodwork on every LLM post to tell us that our positive experiences using LLMs are imagined and we just haven’t realized how bad they are yet.
Some people got into coding to code, rather than build things.
If the AI is doing the coding then that is a threat to some people. I am not sure why, LLMs can be good and you can enjoy coding...those things are unrelated. The logic seems to be that if LLMs are good then coding is less fun, lol.
Software jobs pay more than artist jobs because coding builds things. You can still be a code artist on your own time. Nobody is stopping you from writing in assembler.
And chess players stream as their primary income, because there's no money in Chess unless you're exactly the best player in the world (and even then the money is coming from sponsors/partners, not from chess itself).
The AI tooling churn is so fast that by the time a study comes out people will be able to say "well they were using an older tool" no matter what tool that the study used.
It's the eternal future.
"AI will soon be able to...".
There's an entire class of investment scammers that string along their marks,
claiming that the big payoff is just around corner while they fleece the victim with the death of a thousand cuts.
Not really. Chatting with a llm was cutting edge for 3 years it’s only within the last 8-10 months with Claude code and Gemini cli do we have the next big change in how we interact with llms
If there are paradigm-shattering improvements every six months, every single study that is ever released will be "behind" or "use an older tool." In six months when a study comes out using Claude Code, people dissatisfied with it will be able to point to the newest hotness, ad infinitum.
Could it not be that those positive experiences are just shining a light that the practices before using an LLM were inefficient? It’s more a reflection on the pontificator than anything.
Sure, but even then the perspective makes no sense. The common argument against AI at this point (e.g. OP) is that the only reason people use it is because they are intentionally trying to prop up high valuations - they seem unable to understand that other people have a different experience than they do. You’d think that just because there are some cases where it doesn’t work doesn’t necessarily mean that 100% of it is a sham. At worst it’s just up to individual taste, but that doesn’t mean everyone who doesn’t share your taste is wrong.
Consider cilantro. I’m happy to admit there are people out there who don’t like cilantro. But it’s like the people who don’t like cilantro are inventing increasingly absurd conspiracy theories (“Redis is going to add AI features to get a higher valuation”) to support their viewpoint, rather than the much simpler “some people like a thing I don’t like”.
"Redis for AI is our integrated package of features and services designed to get your GenAI apps into production faster with the fastest vector database."
posting a plain text description of your experience on a personal blog isn't exactly screaming. in the noise of the modern internet this would be read by nobody if it wasn't coming from one of the most well known open source software creators of all time.
people who believe in open source don't believe that knowledge should be secret. i have released a lot of open source myself, but i wouldn't consider myself a "true believer." even so, i strongly believe that all information about AI must be as open as possible, and i devote a fair amount of time to reverse engineering various proprietary AI implementations so that i can publish the details of how they work.
why? a couple of reasons:
1) software development is my profession, and i am not going to let anybody steal it from me, so preventing any entity from establishing a monopoly on IP in the space is important to me personally.
2) AI has some very serious geopolitical implications. this technology is more dangerous than the atomic bomb. allowing any one country to gain a monopoly on this technology would be extremely destabilizing to the existing global order, and must be prevented at all costs.
LLMs are very powerful, they will get more powerful, and we have not even scratched the surface yet in terms of fully utilizing them in applications. staying at the cutting edge of this technology, and making sure that the knowledge remains free, and is shared as widely as possible, is a natural evolution for people who share the open source ethos.
If consumer "AI", and that includes programming tools, had real geopolitical implications it would be classified.
The "race against China" is a marketing trick to convince senators to pour billions into "AI". Here is who is financing the whole bubble to a large extent:
People are insane, you can artificially pine for the simpler betters times made up in your mind when you could give oracle all your money.
But I would stake my very life on the fact that the movement by developers we call open-source is the single greatest community and ethos humanity has ever created.
Of course it inherits from enlightenment and other thinking, it doesn't exist in a vacuum, but it is an extension of the ideologies that came before it.
I challenge anyone to come up with any single modern subcultures that has tangibly generated more that touches more lives, moves more weight, travels farther, effects humanity more every single day from the moment they wake up than the open source software community (in the catholic sense obviously).
Both in moral goodness and in measurable improvement in standard of living and understanding of the universe.
Some people's memories are very short indeed, all who pine pine for who they imagined they were and are consumed by a memetic desire of their imagined selves.
So ironic that you post this on Hacker News, where there are regularly articles and blog posts about lessons from the industry, both good and bad, that would be helpful to competitors. This industry isn’t exactly Coke guarding its secret recipe.
I think many devs are guarding their secrets, but the last few decades have shown us that an open foundation can net huge benefits for everyone (and then you can put your secret sauce in the last mile.)
OP as a free user of Gemini 2.5 Pro via Ai studio my friend has been hit by the equivalent of a car breaking approximately 3 weeks, I hope they can recover soon, it is not easy for them.
Please send your thoughts and prayers to Gemini 2.5 Pro hopefully they can recover and get well soon enough, I hope Google lets them out of the hospital soon and discharges them, the last 3 week has been hell for me without them there.
IMO Claude code was a huge step up. We have a large and well structured python code base revolving mostly around large and complicated adapter pattern Claude is almost fully capable to implement a new adapter if given the right prompt/resources.
Have used Claude's GitHub action quite a bit now (10-20 issue implementations, a bit more PR reviews), and it is hit and miss so agree with the enhanced coding rather than just letting it run loose.
When the change is very small, self-contained feature/refactor it can mostly work alone, if you have tests that cover the feature then it is relatively safe (and you can do other stuff because it is running in an action, which is a big plus...write the issue and you are done, sometimes I have had Claude write the issue too).
When it gets to a more medium size, it will often produce something that will appear to work but actually doesn't. Maybe I don't have test coverage and it is my fault but it will do this the majority of the time. I have tried writing the issue myself, adding more info to claude.md, letting claude write the issue so it is a language it understands but nothing works, and it is quite frustrating because you spend time on the review and then see something wrong.
And anything bigger, unsurprisingly, it doesn't do well.
PR reviews are good for small/medium tasks too. Bar is lower here though, much is useless but it does catch things I have missed.
So, imo, still quite a way from being able to do things independently. For small tasks, I just get Claude to write the issue, and wait for the PR...that is great. For medium (which is most tasks), I don't need to do much actual coding, just directing Claude...but that means my productivity is still way up.
I did try Gemini but I found that when you let it off the leash and accept all edits, it would go wild. We have Copilot at work reviewing PRs, and it isn't so great. Maybe Gemini better on large codebases where, I assume, Claude will struggle.
What is the overall feedback loop with LLMs writing code? Do they learn as they go like we do? Do they just learn from reading code on GitHub? If the latter, what happens as less and less code gets written by human experts? Do the LLMs then stagnate in their progress and start to degrade? Kind of like making analog copies of analog copies of analog copies?
Code and math are similar to chess/go, where verification is (reasonably) easy so you can generate your own high-quality training data. It's not super straightforward, but you should still expect more progress in coming years.
Unlike OP, from my still limited but intense month or so diving into this topic so far, I had better luck with Gemini 2.5 PRO and Opus 4 on more abstract level like architecture etc. and then dealing input to Sonnet for coding. I found 2.5 PRO, and to a lesser degree Opus, were hit or miss; A lot of instances of them circling around the issue and correcting itself when coding (Gemini especially so), whereas Sonnet would cut to the chase, but needed explicit take on it to be efficient.
Totally possible. In general I believe that while more powerful in their best outputs, Sonnet/Opus 4 are in other ways (alignment / consistency) a regression on Sonnet 3.5v2 (often called Sonnet 3.6), as Sonnet 3.7 was. Also models are complex objects, and sometimes in a given domain a given model that on paper is weaker will work better. And, on top of that: interactive use vs agent requires different reinforcement learning training that sometimes may not be towards an aligned target... So also using the model in one way or the other may change how good it is.
This is my experience too. I usually use Gemini 2.5 Pro through AI Studio for big design ideas that need to be validated and refined. Then take the refined requirements to Claude Code which does an excellent job most of the time in coding them properly. Recently I tried Gemini CLI, and it's not even close to Claude Code's sharp coding skills. It often makes syntax mistakes, and get stuck trying to get itself out of a rut; its output is so verbose (and fast) that it's hard to follow what it's trying to do. Claude Code has a much better debugging capability.
Another contender in the "big idea" reasoning camp: DeepSeek R1. It's much slower, but most of the time it can analyze problems and get to the correct solution in one shot.
I have found that if I ask the LLM to first _describe_ to me what it wants to do without writing any code, then the subsequent code generated has much higher quality. I will ask for a detailed description of the things it wants to do, give it some feedback and after a couple of iterations, tell it to go ahead and implement it.
“Always be part of the loop by moving code by hand from your terminal to the LLM web interface: this guarantees that you follow every process. You are still the coder, but augmented.”
I agree with this, but this is why I use a CLI. You can pipe files instead of copying and pasting.
Yeah it is also a bit of a shibboleth: vibes coding, when I'm productive for the 80% case with Claude code, is about the LLM cranking for 10-20min. I'm instructing & automating the LLM on how to do its own context management, vs artisanally making every little decision.
Ex: Implementing a spec, responding to my review comments, adding wider unit tests, running a role play for usability testing, etc. The main time we do what he describes of manually copying into a web ide is occasionally for a better short use of a model, like only at the beginning of some plan generation, or debug from a bunch of context we have done manually. Like we recently solved some nasty GPU code race this way, using a careful mix of logs and distributed code. Most of our job is using Boring Tools to write Boring Code, even if the topic/area is neato: you do not want your codebase to work like an adventure for everything, so we invest in making it look boring.
I agree the other commenter said: I manage context as part of the skill, but by making the AI do it. Doing that by hand is like slowly handcoding assembly. Instead, I'm telling Claude Code to do it. Ex: Download and crawl some new dependency I'm using for some tricky topic, or read in my prompt template markdown for some task, or generate and self-maintain some plan.md with high-level rules on context I defined. This is the 80% case.
Maybe one of the disconnects is task latency vs throughput as trade-offs in human attention. If I need the LLM to get to the right answer faster, so the task is done faster, I have to lean in more. But my time is valuable and I have a lot to do. If rather spend 50% less of my time per task, even if the task takes 4x longer, by the LLM spinning longer. In that saved human time, I can be working on another task: I typically have 2-3 terminals running Claude, so I only check in every 5-15min.
* DevOps infrastructure: docker, aws, ci systems, shell scripts, ...
* Analytics & data processing
* AI investigations (logs, SIEMs, ..) <--- what we sell!
* GPU kernels
* Compilers
* Docs
* Test amplification
* Spec writing
I think ~half the code happening by professional software engineers fits into these, or other vibes friendly domains. The stuff antirez does with databases seems close to what we do with compilers, GPU kernels, and infra.
We are still not happy with production-grade frontend side of coding, though by being strong on API-first design and keeping logic vs UI seperated, most of our UI code is friendly to headless.
I currently use LLMs as a glorified Stack Overflow. If I want to start integrating an LLM like Gemini 2.5 PRO into my IDE (I use Visual Studio Code), whats the best way to do this? I don't want to use a platform like Cursor or Claude Code which takes me away from my IDE.
Cursor is an IDE. You can use its powerful (but occasionally wrong) autocomplete, and start asking it to do small coding tasks using the Ctrl+L side window.
I don't either but unfortunately Cursor is better than all the other plugins for IDEs like JetBrains. I just tab over to cursor and prompt it, then edit the code in my IDE of choice.
Worth noting that Cursor is a VS Code fork and you can copy all of your settings over to it. Not saying that you have to, of course, but that it's perhaps not as different as you might be imagining.
Thank you! When I was testing out Copilot I was stuck with whatever default LLM was being used. Didn't realize you could switch it out for a non-MS/OpenAI model.
My question on all of the “can’t work with big codebases” is how would a codebase that was designed for an LLM look like? Composed of many many small functions that can be composed together?
You can use an LLM to help document a codebase, but it's still an arduous task because you do need to review and fix up the generated docs. It will make, sometimes glaring sometimes subtle, mistakes. And you want your documentation to provide accuracy rather than double down on or even introduce misunderstanding.
This fact is one of the most pleasant surprises I’ve had during this AI wave. Finally, a concrete reason to care about your docs and your code quality.
And on top of that - can you steer an LLM to create this kind of code? In my experience the models don’t really have a „taste” for detecting complexity creep and reengineering for simplicity, in the same way an experienced human does.
I am vibe coding a complex app. You can certainly keep things clean but the trick is to enforce a rigid structure. This does add a veneer of complexity but simplifies " implement this new module" or "add this feature across all relevant files".
I found that it is beneficial to create more libraries. If I for example build a large integration to an API (basically a whole api client) I would in the past have it in the same repo but now I make it a standalone library.
I think it means finer toplevel granularity re: what's runnable/testable at a given moment. I've been exploring this for my own projects and although it's not a silver bullet, I think there's something to it.
----
Several codebases I've known have provided a three-stage pipeline: unit tests, integration tests, and e2e tests. Each of these batches of tests depend on the creation of one of three environments, and the code being tested is what ends up in those environments. If you're interested in a particular failing test, you can use the associated environment and just iterate on the failing test.
For humans with a bit of tribal knowledge about the project, humans who have already solved the get-my-dev-environment-set-up problem in more or less uniform way, this works ok. Humans are better at retaining context over weeks and months, whereas you have to spin up a new session with an LLM every few hours or so. So we've created environments for ourselves that we ignore most of the time, but that are too complex to be bite sized for an agent that comes on the scene as a blank slate every few hours. There are too few steps from blank-slate to production, and each of them is too large.
But if successively more complex environments can be built on each other in arbitrarily many steps, then we could achieve finer granularity. As a nix user, my mental model for this is function composition where the inputs and outputs are environments, but an analogous model would be layers in a docker files where you test each layer before building the one on top of it.
Instead of maybe three steps, there are eight or ten. The goal would be to have both whatever code builds the environment, and whatever code tests it, paired up into bite-sized chunks so that a failure in the pipeline points you a specific stage which is more specific that "the unit tests are failing". Ideally test coverage and implementation complexity get distributed uniformly across those stages.
Keeping the scope of the stages small maximizes the amount of your codebase that the LLM can ignore while it works. I have a flake output and nix devshell corresponding to each stage in the pipeline and I'm using pytest to mark tests based on which stage they should run in. So I run the agent from the devshell that corresponds with whichever stage is relevant at the moment, and I introduce it to onlythe tests and code that are relevant to that stage (the assumption being that all previous stages are known to be in good shape). Most of the time, it doesn't need to know that it's working stage 5 of 9, so it "feels" like a smaller codebase than it actually is.
If evidence emerges that I've engaged the LLM at the wrong stage, I abandon the session and start over at the right level (now 6 of 9 or somesuch).
I used a similar setup until a few weeks ago, but coding agents became good enough recently.
I don’t find context management and copy pasting fun, I will let GitHub Copilot Insiders or Claude Code do it. I’m still very much in the loop while doing vibe coding.
Of course it depends on the code base, and Redis may not benefit much from coding agents.
But I don’t think one should reject vibe coding at this stage, it can be useful when you know what the LLMs are doing.
I find agentic coding to be best when using one branch per conversation. Even if that conversation is only a single bugfix, branch it. Then do 2 or 3 iterations of that same conversation across multiple branches and choose the best result of the 3 and destroy the other two.
Lovely post @antirez. I like the idea that LLMs should be directly accessing my codebase and there should be no agents in between. Basically no software that filters what the LLM sees.
That said, are there tools that make going through a codebase easier for LLMs? I guess tools like Claude Code simply grep through the codebase and find out what Claude needs. Is that good enough or are there tools which keep a much more thorough view of the codebase?
Terminal with vim in one side, the official web interface of the model in the other side. The pbcopy utility to pass stuff in the clipboard. I believe models should be used in their native interface as when there are other layers sometimes the model served is not exactly the same, other times it misbehaves because of RAG and in general no exact control of the context window.
This seems like a lot of work depending upon the use case. e.g. the other day I had a bunch of JSON files with contact info. I needed to update them with more recent contact info on an internal Confluence page. I exported the Confluence page to a PDF, then dropped it into the same directory as the JSON files. I told Claude Code to read the PDF and use it to update the JSON files.
It tried a few ways to read the PDF before coming up with installing PyPDF2, using that to parse the PDF, then updated all the JSON files. It took about 5 minutes to do this, but it ended up 100% correct, updating 7 different fields across two dozen JSON files.
(The reason for the PDF export was to get past the Confluence page being behind Okta authentication. In retrospect, I probably should've saved the HTML and/or let Claude Code figure out how to grab the page itself.)
How would I have done that with Gemini using just the web interface?
He uses vim and copy paste code from web interfaces because he wants to maintain control and understanding of the code. You can find proofs of this setup on his youtube channel [https://www.youtube.com/@antirez]
Thanks. Also based on the coding rig you use models may not match the performance of what it is served via web. Or may not be as cheap. For instance the Gemini 2.5 pro 20$ account is very hard to saturate with queries.
Can anyone recommend a workflow / tools that accomplishes a slightly more augmented version of antirez’ workflow & suggestions minus the copy-pasting?
I am on board to agree that pure LLM + pure original full code as context is the best path at the moment, but I’d love to be able to use some shortcuts like quickly applying changes, checkpoints, etc.
My persistent (and not unfounded?) worry is that all the major tools & plugins (Cursor, Cline/Roo) all play games with their own sub-prompts and context “efficiency”.
Claude Code has worked well for me. It is easy to point it to the relevant parts of the codebase and see what it decides to read itself so you provide missing piece of code when necessary.
Since I’ve heard Gemini-cli is not yet up to snuff, has anyone tried opencode+gemini? I’ve heard that with opencode you can login with Google account (have NOT confirmed this, but if anyone has any experience, pls advise) so not sure if that would get extra mileage from Gemini’s limits vs using a Gemini api key?
Yep when I use agents I go for Claude Code. For example I needed to buy too many Commodore 64 than appropriate lately, and I let it code a Telegram bot advising me when popular sources would have interesting listings. It worked (after a few iterations) then I looked at the code base and wanted to puke but who cares in this case? It worked and it was much faster and I had zero to learn in the proces of doing it myself. I published a Telegram library for C in the past and know how it works and how to do scraping and so forth.
For example I needed to buy too many Commodore 64 than appropriate lately
Been there, done that!
for those one-off small things, LLMs are rather cool. Especially Cloude Code and Gemini CLI. I was given an archive of some really old movies recently, but files were bearing title names in Croatian instead of original (mostly English ones). So I claude --dangerously-skip-permissions into the directory with movies and in a two-sentence prompt I asked it to rename files into a given format (that I tend to have in my archive) and for each title to find original name and year or release and use it in the file.. but, before commiting rename to give me a list of before and after for approval. It took like what, a minute of writing a prompt.
Now, for larger things, I'm still exploring a way, an angle, what and how to do it. I've tried from yolo prompting to structured and uber structured approaches, all the way to mimicking product/prd - architecture - project management / tasks - developer/agents.. so far, unless it's rather simpler projects I don't see it's happening that way. Most luck I had was "some structure" as context and inputs and then guiding prompting during sessions and reviewing stuff. Almost pair-programming.
I found it depends very much on the task. For "architect" sessions you need as much context as you can reasonably gather. The more the merrier. At least gemini2.5 pro will gather the needed context from many files and it really does make a difference when you can give it a lot of it.
On coding you need to aggressively prune it, and only give minimum adjacent context, or it'll start going on useless tangents. And if you get stuck just refresh and start from 0, changing what is included. It's often faster than "arguing" with the LLM in multi-step sessions.
(the above is for existing codebases. for vibe-coding one-off scripts, just go with the vibes, sometimes it works surprisingly well from a quick 2-3 lines prompt)
In my experience as well, Sonnet 4 is much better than Opus. Opus is great at the start of a project, where you would need to plan things, structure the project, figure out how to execute but it cannot beat Sonnet is actually executing it. It is also a lot cheaper.
OP I think gemini 2.5 pro is in the hospital and has been recovering for the last 2 weeks, lets all wish our good friend a good recovery and hope they can get back to their normal selves,
> Gemini 2.5 PRO | Claude Opus 4
Whether it's vibe coding, agentic coding, or copy pasting from the web interface to your editor, it's still sad to see the normalization of private (i.e., paid) LLM models. I like the progress that LLMs introduce and I see them as a powerful tool, but I cannot understand how programmers (whether complete nobodies or popular figures) dont mind adding a strong dependency on a third party in order to keep programming. Programming used to be (and still is, to a large extent) an activity that can be done with open and free tools. I am afraid that in a few years, that will no longer be possible (as in most programmers will be so tied to a paid LLM, that not using them would be like not using an IDE or vim nowadays), since everyone is using private LLMs. The excuse "but you earn six figures, what' $200/month to you?" doesn't really capture the issue here.
The models I can run locally aren't as good yet, and are way more expensive to operate.
Once it becomes economical to run a Claude 4 class model locally you'll see a lot more people doing that.
The closest you can get right now might be Kimi K2 on a pair of 512GB Mac Studios, at a cost of about $20,000.
> The excuse "but you earn six figures, what' $200/month to you?" doesn't really capture the issue here.
Just like every other subscription model, including the one in the Black Mirror episode, Common People. The value is too good to be true for the price at the beginning. But you become their prisoner in the long run, with increasing prices and degrading quality.
Can you expand on your argument?
I don't think it's subscriptions so much as consumer startup pricing strategies:
Netflix/Hulu were "losing money on streaming"-level cheap.
Uber was "losing money on rides"-level cheap.
WeWork was "losing money on real-estate" level cheap.
Until someone releases wildly profitable LLM company financials it's reasonable to expect prices to go up in the future.
Course, advances in compute are much more reasonable to expect than advances in cheap media production, taxi driver availability, or office space. So there's a possibility it could be different. But that might require capabilities to hit a hard plateau so that the compute can keep up. And that might make it hard to justify the valuations some of these companies have... which could also lead to price hikes.
But I'm not as worried as others. None of these have lock-in. If the prices go up, I'm happy to cancel or stop using it.
For a current student or new grad who has only ever used the LLM tools, this could be a rougher transition...
Another thing that would change the calculation is if it becomes impossible to maintain large production-level systems competitively without these tools. That's presumably one of the things the companies are betting on. We'll see if they get there. At that point many of us probably have far bigger things to worry about.
Not op but a something from a few days ago that might be interesting for you:
https://news.ycombinator.com/item?id=44598254There is a reason why companies throw billions into AI and still are not profitable. They must be the first ones to hook the users in the long run and make service necessary part of user’s life. And then increase the price.
Currently in the front page of HN: https://news.ycombinator.com/item?id=44622953
It isn’t specific to software/subscriptions but there are plenty of examples of quality degradation in the comments
enshittification/vendor-lockin/stickiness/… take your pick
The argument is perhaps ”enshittification”, and that becoming reliant on a specific provider or even set of providers for ”important thing” will become problematic over time.
> I cannot understand how programmers don't mind adding a strong dependency on a third party in order to keep programming
And how they don't mind freely opening up their codebase to these bigtech companies.
> Programming used to be (and still is, to a large extent) an activity that can be done with open and free tools.
Yet JetBrains has been a business longer than some of my colleagues have been alive, and Microsoft’s Visual Basic/C++/Studio made writing software for Windows much easier, and did not come cheap.
I see a big difference: I do use Jetbrains IDEs (they are nice), but I can switch to vim (or vscode) any time if I need to (e.g., let's say Jetbrains increase their price to a point that doesn't make sense, or perhaps they introduce a pervasive feature that cannot be disabled). The problem with paid LLMs is that one cannot easily switch to open-source ones (because they are not as good as the paid ones). So, it's a dependency that cannot be avoided, and that's imho something that shouldn't be overlooked.
People who understand the importance of this choice but still opt for closed source software are the worst of the worst.
You won’t be able to switch to a meaningful vim if you channel your support to closed source software, not for long.
Best to put money where the mouth is.
I don't contribute to vim precisely, but I do contribute to other open source projects. So, I do like to keep this balance between making open source tools better over time and using paid alternatives. I don't think that's possible tho with LLMs at the moment (and I dont think it would be possible in the future, but ofc i could be wrong).
I was a hardcore vim user 10 years ago, but now I just use PyCharm to work. I'm paid to solve problems, not to futz around with vim configs.
Can you make vim work roughly the same way? Probably you can get pretty close. But how many hours do I have to sink into the config? A lot. And suddenly the PyCharm license is cheap.
And it's exactly the same thing with LLMs. You want hand crafted beautiful code, untainted by AI? You can still do that. But I'm paid to solve problems. I can solve them faster/solve more of them? I get more money.
> I was a hardcore vim user 10 years ago, but now I just use PyCharm to work. I'm paid to solve problems, not to futz around with vim configs.
The reason I don't like those arguments is that they merge two orthogonal stuff: Solving problems and optimizing your tooling. You can optimize PyCharm just as much you can fiddle with Vim's config. And people are solving with problems with Vim just as you do with an IDE. It's just a matter of preference.
In my day job, I have two IDEs, VSCode, and Emacs open. I prefer Emacs to edit and git usage, but there's a few things that only the IDEs can do (as in I don't bother setting emacs to do the same), and VSCode is there because people get dizzy with the way I switch buffers in Emacs.
Open-weight and open-source LLMs are improving as well. While there will likely always be a gap between closed, proprietary models and open models, at the current pace the capabilities of open models could match today’s closed models within months.
> because they are not as good as the paid ones
The alternative is to restrict yourself to “not as good” ones already now.
Anyone can switch from Claude to llama?
I don't think so. Let's do a silly experiment: antirez, could you ditch Gemini 2.5 PRO and Claude Opus 4, and instead use llama? Like never again go back to Gemini/Claude. I don't think he can (I don't think he would want to). I this is not on antirez, this is on everyone who's paying for LLMs at the moment: they are paying for them because they are so damn good compared to the open source ones... so there's no incentive to switch. But again, that's like the climate change: there's no incentive to pollute less (well, perhaps to save us, but money is more important).
Ah, there are community editions of the most imiportant tools (since 10+ years), and i doubt e.g. MS will close VS.NET Community Version in the future.
If the models are getting cheaper, better, and freer even when we use paid ones, then right now is the time to develop techniques, user interfaces, and workflows that become the inspirations and foundations of a future world of small, local, and phenomenally powerful models that have online learning, that can formalize their reason, that can bake deduction into their own weights and code.
> Programming used to be (and still is, to a large extent) an activity that can be done with open and free tools. I am afraid that in a few years, that will no longer be possible .. The excuse "but you earn six figures, what' $200/month to you?" doesn't really capture the issue here.
One aspect of this that's really sad to think about is, coding (and to a lesser extent IT in general) at one point was a real meritocracy, where skill mattered more than expensive/unnecessary academic pedigree. Not perfect of course, but real nevertheless. And coders were the first engineers who really said "I won't be renting a suit for an interview, I think an old t-shirt is fine" and we normalized that. Part of this was just uncompromisingly practical.. like you can either do the work or not, and fuck the rest of that noise. But there was also a pretty punk aspect to this for many people in the industry.. some recognition that needing to have money to make money was a bullshit relic of closeted classism.
But we're fast approaching a time where both the old metrics (how much quality code are you writing how fast and what's your personal open source portfolio like?) and the new metrics (are you writing a blog post every week about your experience with the new models, is your personal computer fast enough to even try to run crappy local models?) are both going to favor those with plenty of money to experiment.
It's not hard to see how this will make inequality worse and disadvantage junior devs, or just talented people that didn't plan other life-events around purchasing API credits/GPUs. A pay-to-play kind of world was ugly enough in politics and business so it sucks a lot to see it creeping into engineering disciplines but it seems inevitable.
Yes, and what is worse is that the same mega-corporations who have been ostensibly promoting equity until 2025 are now pushing for a gated development environment that costs the same as a monthly rent in some countries or more than a monthly salary in others.
That problem does not even include lock-in, surveillance, IP theft and all other things that come with SaaS.
It’s not that bad: K2 and DeepSeek R1 are at the level of frontier models of one year ago (K2 may be even better: I have enough experience only with V3/R1). We will see more coming since LLMs are incredibly costly to train but very simple in their essence (it’s like if their fundamental mechanic is built in the physical nature of the computation itself) so the barrier to entry is large but not insurmountable.
It’s weird that programmers will champion paying for Llm but not ad-free web search.
Ad-free search doesn't by itself produce a unique product. It's just a product that doesn't have noise, noise that people with attention spans and focus don't experience at all.
Local models are not quite there yet. For now, use the evil bad tools to prepare for the good free tools when they do get there. It's a self-correcting form of technical debt that we will never have to pay down.
“To prepare for the good free tools”
Why do I have to prepare? Once the good free tools are available, it should just work no?
I pay for search and have convinced several of my collaborators to do so as well
I think the dev population mostly uses free search, just based on the fact no one has told me to “Kagi it” yet.
When I need a facial tissue I ask for a Kleenex even if the box says Puffs. Because who says "pass me the Puffs"?
I've been curious of that phenomenon, why not juat ask "pass me a tissue?"
They have adblock
Paid models are just much, much better.
Of course they are. I wouldn't expect otherwise :)
But the price we're paying (and I don't mean money) is very high, imho. We all talk about how good engineers write code that depends on high-level abstractions instead of low-level details, allowing us to replace third party dependencies easily and test our apps more effectively, keeping the core of our domain "pure". Well, isn't it time we started doing the same with LLMs? I'm not talking about MCP, but rather an open source tool that can plug into either free and open source LLMs or private ones. That would at least allow us to switch to a free and opensource version if the companies behind the private LLMs go rogue. I'm afraid tho that wouldn't be enough, but it's a starting point.
To put an example: what would you think if you need to pay for every single Linux process in your machine? Or for every Git commit you make? Or for every debugging session you perform?
> an open source tool that can plug into either free and open source LLMs or private ones
Fortunately there are many of these that can integrate with offline LLMs through systems like LiteLLM/Ollama/etc. Off the top of my head, I'd look into Continue, Cline and Aider.
https://github.com/continuedev/continue
https://github.com/cline/cline
https://github.com/Aider-AI/aider
> I'm not talking about MCP, but rather an open source tool that can plug into either free and open source LLMs or private ones. That would at least allow us to switch to a free and opensource version if the companies behind the private LLMs go rogue. I'm afraid tho that wouldn't be enough, but it's a starting point.
There are open source tools that do exactly that already.
Ah, well that's nice. But every single post I read don't mention them? So, I assume they are not popular for some reason. Again, my main point here is: the normalization of using private LLMs. I don't see anyone talking about it; we are all just handing over a huge part of what it means to build software to a couple of enterprises whose goal is, of course, to maximize profit. So, yeah, perhaps I'm overthinking I don't know; I just don't like that now these companies are so ingrained in the act of building software (just like AWS is so ingrained in the act of running software)
>every single post I read don't mention them
Because the models are so much worse that people aren't using them.
Philosophical battles don't pay the bills and for most of us they aren't fun.
There have been periods of my life where I stubbornly persisted using something inferior for various reasons - maybe I was passionate about it, maybe I wanted it to exist and was willing to spend my time debugging and offer feedback - but there a finite number of hours in my life and often I'd much rather pay for something that works well than throw my heart, soul, time, and blood pressure at something that will only give me pain.
Does every single post about a Jetbrains feature mention that you can easily switch from Jetbrains to an open source editor like VS Code or vim?
> I'm not talking about MCP, but rather an open source tool that can plug into either free and open source LLMs or private ones.
I have been building that for a couple of years now: https://llm.datasette.io
> I'm not talking about MCP, but rather an open source tool that can plug into either free and open source LLMs or private ones.
Has someone computed/estimated what is at cost $$$ value of utilizing these models at full tilt: several messages per minute and at least 500,000 token context windows? What we need is a wikipedia like effort to support something truly open and continually improving in its quality.
None of that applies here since we could all easily switch to open models at a moment's notice with limited costs. In fact, we switch between proprietary models every few months.
It just so happens that closed models are better today.
I personally can’t wait for programming to ‘die’. It has stolen a decade of my life minimum. Like veterinarians being trained to help pets ultimately finding out a huge portion of the job is killing them. I was not sufficiently informed that I’d spend a decade arguing languages, dealing with thousands of other developers with diverging opinions, legacy code, poorly if at all maintained libraries, tools, frameworks, etc if you have been in the game at least a decade please don’t @. Adios to programming as it was (happily welcoming a new DIFFERENT reality whatever that means). Nostalgia is for life, not staring at a screen 8hrs a day
> It has stolen a decade of my life minimum.
Feels like this is a byproduct of a poor work-life balance more than an intrinsic issue with programming itself. I also can't really relate since I've always enjoyed discussing challenging problems with colleagues.
I'm assuming by "die" you mean some future where autonomous agentic models handle all the work. In this world, where you can delete your entire programming staff and have a single PM who tells the models what features to implement next, where do you imagine you fit in?
I just hope for your sake that you have a fallback set of viable skills to survive in this theoretical future.
You got some arguably rude replies to this but you're right. I've been doing this a long time and the stuff you listed is never the fun part despite some insistence on HN that it somehow is. I love programming as a platonic ideal but those moments are fleeting between the crap you described and I can't wait for it to go.
Feel free to change careers and get lost, no one is forcing you to be a programmer.
If you feel it is stealing your life, then please feel free to reclaim your life at any time.
Leave the programming to those of us who actually want to do it. We don't want you to be a part of it either
Maybe it's just not for you.
I've been programming professionally since 2012 and still love it. To me the sweet spot must've been the early mid 2000s, with good enough search engines and ample documentation online.
Did you expect computer programming not to involve this much time at a computer screen? Most modern jobs especially in tech do. If it’s no longer fulfilling, it might be worth exploring a different role or field instead of waiting for the entire profession to change.
I understand your frustration but the problem is mostly people. Not the particular skill itself.
IMO It's not unlike all other "dev" tools we use at all. There are tons of free and open tools that usually lag a bit behind the paid versions. People pay for jetbrains, for mac os, and even to search the web (google ads).
You have very powerful open weight models, they are not the cutting edge. Even those you can't really run locally, so you'd have to pay a 3rd party to run it.
Also the competition is awesome to see, these companies are all trying hard to get customers and build the best model and driving prices down, and giving you options. No one company has all of the power, its great to see capitalism working.
You don't pay for macOS, you pay for apple device, operating system is free.
You do pay for the operating system. And for future upgrades to the operating system. Revenue recognition is a complex and evolving issue.
Thanks captain missing the point
LLMS are basically free? Yes you're rate limited but I have just started paying for them now, before I'd bounce around between the providers but still free
> Programming used to be (and still is, to a large extent) an activity that can be done with open and free tools.
Since when? It starts with computers, the main tool and it's architecture not being free and goes from there. Major compilers used to not be free. Major IDEs used to not be free. For most things there were decent and (sometimes) superior free alternatives. The same is true for LLMs.
> The excuse "but you earn six figures, what' $200/month to you?" doesn't really capture the issue here.
That "excuse" could exactly capture the issue. It does not, because you chose to make it a weirder issue. Just as before: You will be free to either not use LLMs, or use open-source LLMs, or use paid LLMs. Just as before in the many categories that pertain to programming. It all comes at a cost, that you might be willing to pay and somebody else is free to really does not care that much about.
> Major compilers used to not be free. Major IDEs used to not be free.
There were and are a lot of non-free ones, but since the 1990s, GCC and interpreted languages and Linux and Emacs and Eclipse and a bunch of kinda-IDEs were all free, and now VS Code is one of the highest marketshare IDEs, and those are all free. Also, the most used and learned programming language is JS, which doesn't need compilers in the first place.
There are free options and there continue to be non-free options. The same is true for LLMs.
When's the last time you paid for a compiler?
The original point was that there is some inherent tradition in programming being free, with a direct critique wrt LLMs, which apparently breaks that tradition.
And my point is that's simply not the case. Different products have always been not free, and continue to be not free. Recent example would be something like Unity, that is not entirely free, but has competitors, which are entirely free and open source. JetBrain is something someone else brought up.
Again: You have local LLMs and I have every expectation they will improve. What exactly are we complaining about? That people continue to build products that are not free and, gasp, other people will pay for them, as they always have?
> Major compilers used to not be free
There's never been anything stopping you from building your own
Soon there will be. The knowledge of how to do so will be locked behind LLMs, and other sources of knowledge will be rarer and harder to find as a result of everything switching to LLM use
For the past decades knowledge was "locked" behind search engines. Could you have rolled your own search engine indexing the web, to unlock that knowledge? Yes, in the same theoretical way that you can roll your own LLM.
There was never anything stopping you from finding other avenues than Search Engines to get people to find your website. You could find a url on a board at a cafe and still find a website without a search engine. More local sure, but knowledge had ways to spread in the real world when it needed to
How are LLMs equivalent? People posting their prompts on bulletin boards at cafes?
But what is (or will be) stopping you from finding avenues other than LLMs? You say other sources of knowledge will be rarer. But they will still exist, and I don't see why they will become less accessible than non-search-engine-indexed content is now.
> Programming used to be (and still is, to a large extent) an activity that can be done with open and free tools.
Not without a lot of hard thankless work by people like RMS to write said tools. Programming for a long while was the purview of Microsoft Visual Studio family, which cost hundreds, if not thousands of dollars. There existed other options, some of which was free, but, as is the case today with LLMs you can run at home, they were often worse.
This is why making software developer tools is such a tough market and why debugging remains basically in the dark ages (though there are the occasional bright lights like rr). Good quality tools are expensive, for doctors and mechanics, why do we as software developers expect ours to be free, libre and gratis?
Why do you see this as a strong dependency? The beauty of it is that you can change the model whenever you want. You can even just code yourself! This isn't some no-code stuff.
Doesn't already happen with some people being unable to code without Google or similar?
Kimi k2 exists now.
The issue is somebody will have to debug and fix what those LLM Leeches made up. I guess then companies will have to hire some 10x Prompters?
[dead]
I'm certain these are advertorials masquerading as personal opinions. These people are being paid to promote the product, either through outright cash, credits on their platform or just swag.
So, just so I have this straight, you think antirez is being paid by Google to hype Gemini.
A lot of people are really bad at change. See: immigration. Short of giving everyone jazz improv lessons at school, there's nothing to be done.
To be fair, change is not always good. We still haven't fixed fitness/obesity issues caused (partly) by the invention of the car, 150 years later. I think there's a decent chance LLMs will have the same effect on the brain.
I recommend readjusting your advertorial-detecting radar. antirez isn't taking kickbacks from anyone.
I added a "disclosures" section to my own site recently, in case you're interested: https://simonwillison.net/about/#disclosures
It started out as an innocent kv cache before the redis industrial complex became 5% of the GDP
I'm going a little offtopic here, but I disagree with the OPs use of the term "PhD-level knowledge", although I have a huge amount of respect for antirez (beside that we are born in the same island).
This phrasing can be misleading and points to a broader misunderstanding about the nature of doctoral studies, which it has been influenced by the marketing and hype discourse surrounding AI labs.
The assertion that there is a defined "PhD-level knowledge" is pretty useless. The primary purpose of a PhD is not simply to acquire a vast amount of pre-existing knowledge, but rather to learn how to conduct research.
Agree with that. Read it as expert-level knowledge without all the other stuff LLMs can’t do as well as humans. LLMs way to express knowledge is kinda of alien as it is different, so indeed those are all poor simplifications. For instance an LLM can’t code as well as a top human coder but can write a non trivial program from the first to the last character without iterating.
Hey antirez,
What sticks out to me is Gemini catching bugs before production release, was hoping you’d give a little more insight into that.
Reason being is that we expect ai to create bugs and we catch them, but if Gemini is spotting bugs by some way of it being a QA (not just by writing and passing tests) then that perks my interest.
> but rather to learn how to conduct research
Further, I always assumed PhD level of knowledge meant coming up with the right questions. I would say it is at best a "Lazy Knowledge Rich worker", it won't explore hypothesis if you don't *ask it* to. A PHD would ask those questions to *themselves*. Let me give you a simple example:
The other day Claude Code(Max Pro Subscription) commented out a bunch of test assertions as a part of a related but separate test suite it was coding. It did not care to explore — what was a serious bug — why it was commenting it out because of a faulty assumption in the original plan. I had to ask it to change the plan by doing the ultra-think, think-hard trick to explore why it was failing, amend the plan and fix it.
The bug was the ORM object had null values because it was not refreshed after the commit and was fetched before by another DB session that had since been closed.*
If you understand that a PhD is about much more than just knowledge, it's still the case that having easy access to that knowledge is super valuable. My last job we often had questions that would just traditionally require a PhD-level person to answer, even if it wasn't at the limit of their research abilities. "What will happen to the interface of two materials if voltage is applied in one direction" type stuff, turns out to be really hard to answer but LLMs do a decent job.
Have you checked experimentally the response of the LLM?
Anyway I don't think this is ""PhD-knowledge"" questions, but job related electrical engineering questions.
> The primary purpose of a PhD is not simply to acquire a vast amount of pre-existing knowledge, but rather to learn how to conduct research.
It’s not like once you have a PhD anyone cares about the subject, right? The only thing that matters is that you learnt to conduct research.
I can't understand why once you have a PhD anyone should care more about the subject.
I think all conversations about coding with LLMs, vibe coding, etc. need to note the domain and choice of programming language.
IMHO those two variables are 10x (maybe 100x) more explanatory than any vibe coding setup one can concoct.
Anyone who is befuddled by how the other person {loves, hates} using LLMs to code should ask what kind of problem they are working on and then try to tackle the same problem with AI to get a better sense for their perspective.
Until then, every one of these threads will have dozens of messages saying variations of "you're just not using it right" and "I tried and it sucks", which at this point are just noise, not signal.
They should also share their prompts and discuss exactly how much effort went into checking the output and re-prompting to get the desired result. The post hints at how much work it takes for the human, "If you are able to describe problems in a clear way and, if you are able to accept the back and forth needed in order to work with LLMs ... you need to provide extensive information to the LLM: papers, big parts of the target code base ... And a brain dump of all your understanding of what should be done. Such braindump must contain especially the following:" and more.
After all the effort getting to the point where the generated code is acceptable, one has to wonder, why not just write it yourself? The time spent typing is trivial to all the cognitive effort involved in describing the problem, and describing the problem in a rigorous way is the essence of programming.
> They should also share their prompts
Here's a recent ShowHN post (a map view for OneDrive photos), which documents all the LLM prompting that went into it:
https://news.ycombinator.com/item?id=44584335
I would assume the argument is that you only need to provide the braindump and extensive information one time (or at least, collect it once, if not upload once) and then you can take your bed of ease as the LLM uses that for many tasks.
The thing is no one writes that much code, at least anyone that cares about code reuse. Mostly the times is spent collecting the information (especially communication with stakeholder), and verifying that the code you wrote didn't break anything.
> After all the effort getting to the point where the generated code is acceptable, one has to wonder, why not just write it yourself?
You know, I would often ask myself that very question...
Then I discovered the stupid robots are good at designing a project, you ask them to produce a design document, argue over it with them for a while, make revision and changes, explore new ideas, then, finally, ask them to produce the code. It's like being able to interact with the yaks you're trying to shave, what's not to love about that?
Translation: His company will launch "AI" products in order to get funding or better compete with Valkey.
I find it very sad that people who have been really productive without "AI" now go out of their way to find small anecdotal evidence for "AI".
I find it even more sad when people come out of the woodwork on every LLM post to tell us that our positive experiences using LLMs are imagined and we just haven’t realized how bad they are yet.
Some people got into coding to code, rather than build things.
If the AI is doing the coding then that is a threat to some people. I am not sure why, LLMs can be good and you can enjoy coding...those things are unrelated. The logic seems to be that if LLMs are good then coding is less fun, lol.
Software jobs pay more than artist jobs because coding builds things. You can still be a code artist on your own time. Nobody is stopping you from writing in assembler.
¯\_(ツ)_/¯ people didn't stop playing chess because computers were better at it than them
And chess players stream as their primary income, because there's no money in Chess unless you're exactly the best player in the world (and even then the money is coming from sponsors/partners, not from chess itself).
We don't just tell you they were imagined, we can provide receipts.
https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...
Cursor is an old way of using LLMs.
Not to mention in the study less than 1/2 have ever used it before the study.
The AI tooling churn is so fast that by the time a study comes out people will be able to say "well they were using an older tool" no matter what tool that the study used.
It's the eternal future. "AI will soon be able to...".
There's an entire class of investment scammers that string along their marks, claiming that the big payoff is just around corner while they fleece the victim with the death of a thousand cuts.
Not really. Chatting with a llm was cutting edge for 3 years it’s only within the last 8-10 months with Claude code and Gemini cli do we have the next big change in how we interact with llms
How is Claude Code and Gemini CLI any different from using Cursor in agent mode? It's basically the same exact thing.
Claude Code was released in May.
Yup. But they are improvements over what cursor was releasing over the last year or so.
If there are paradigm-shattering improvements every six months, every single study that is ever released will be "behind" or "use an older tool." In six months when a study comes out using Claude Code, people dissatisfied with it will be able to point to the newest hotness, ad infinitum.
Could it not be that those positive experiences are just shining a light that the practices before using an LLM were inefficient? It’s more a reflection on the pontificator than anything.
Sure, but even then the perspective makes no sense. The common argument against AI at this point (e.g. OP) is that the only reason people use it is because they are intentionally trying to prop up high valuations - they seem unable to understand that other people have a different experience than they do. You’d think that just because there are some cases where it doesn’t work doesn’t necessarily mean that 100% of it is a sham. At worst it’s just up to individual taste, but that doesn’t mean everyone who doesn’t share your taste is wrong.
Consider cilantro. I’m happy to admit there are people out there who don’t like cilantro. But it’s like the people who don’t like cilantro are inventing increasingly absurd conspiracy theories (“Redis is going to add AI features to get a higher valuation”) to support their viewpoint, rather than the much simpler “some people like a thing I don’t like”.
"Redis for AI is our integrated package of features and services designed to get your GenAI apps into production faster with the fastest vector database."
Tautologically so! That doesn't show that LLMs are useless, it perfectly shows how they are useful.
If LLMs were actually useful, there would be no need to scream it everywhere. On the contrary: it would be a guarded secret.
posting a plain text description of your experience on a personal blog isn't exactly screaming. in the noise of the modern internet this would be read by nobody if it wasn't coming from one of the most well known open source software creators of all time.
people who believe in open source don't believe that knowledge should be secret. i have released a lot of open source myself, but i wouldn't consider myself a "true believer." even so, i strongly believe that all information about AI must be as open as possible, and i devote a fair amount of time to reverse engineering various proprietary AI implementations so that i can publish the details of how they work.
why? a couple of reasons:
1) software development is my profession, and i am not going to let anybody steal it from me, so preventing any entity from establishing a monopoly on IP in the space is important to me personally.
2) AI has some very serious geopolitical implications. this technology is more dangerous than the atomic bomb. allowing any one country to gain a monopoly on this technology would be extremely destabilizing to the existing global order, and must be prevented at all costs.
LLMs are very powerful, they will get more powerful, and we have not even scratched the surface yet in terms of fully utilizing them in applications. staying at the cutting edge of this technology, and making sure that the knowledge remains free, and is shared as widely as possible, is a natural evolution for people who share the open source ethos.
If consumer "AI", and that includes programming tools, had real geopolitical implications it would be classified.
The "race against China" is a marketing trick to convince senators to pour billions into "AI". Here is who is financing the whole bubble to a large extent:
https://time.com/7280058/data-centers-tax-breaks-ai/
In my experience, devs generally aren't secretive about tools they find useful.
People are insane, you can artificially pine for the simpler betters times made up in your mind when you could give oracle all your money.
But I would stake my very life on the fact that the movement by developers we call open-source is the single greatest community and ethos humanity has ever created.
Of course it inherits from enlightenment and other thinking, it doesn't exist in a vacuum, but it is an extension of the ideologies that came before it.
I challenge anyone to come up with any single modern subcultures that has tangibly generated more that touches more lives, moves more weight, travels farther, effects humanity more every single day from the moment they wake up than the open source software community (in the catholic sense obviously).
Both in moral goodness and in measurable improvement in standard of living and understanding of the universe.
Some people's memories are very short indeed, all who pine pine for who they imagined they were and are consumed by a memetic desire of their imagined selves.
> open-source is the single greatest community and ethos humanity has ever created
good lord.
If Internet was actually useful there would be no need to scream it everywhere. Guess that means the internet is totally useless?
If LLMs were actually useful, there would be no need to scream it everywhere. On the contrary: it would be a guarded secret.
LLMs are useful—but there’s no way such an innovation should be a “guarded secret” even at this early stage.
It’s like saying spreadsheets should have remained a secret when they amplified what people could do when they became mainstream.
So ironic that you post this on Hacker News, where there are regularly articles and blog posts about lessons from the industry, both good and bad, that would be helpful to competitors. This industry isn’t exactly Coke guarding its secret recipe.
I think many devs are guarding their secrets, but the last few decades have shown us that an open foundation can net huge benefits for everyone (and then you can put your secret sauce in the last mile.)
Did you read my post? I hope you didn’t.
This post has nothing to do with Redis and is even a follow up to a post I wrote before rejoining the company.
OP as a free user of Gemini 2.5 Pro via Ai studio my friend has been hit by the equivalent of a car breaking approximately 3 weeks, I hope they can recover soon, it is not easy for them.
This is HN. We don't read posts here.
Amen. I have to confess that I made an exception here though. This may be the first submission I read before going into the comments in years.
Please send your thoughts and prayers to Gemini 2.5 Pro hopefully they can recover and get well soon enough, I hope Google lets them out of the hospital soon and discharges them, the last 3 week has been hell for me without them there.
IMO Claude code was a huge step up. We have a large and well structured python code base revolving mostly around large and complicated adapter pattern Claude is almost fully capable to implement a new adapter if given the right prompt/resources.
Have used Claude's GitHub action quite a bit now (10-20 issue implementations, a bit more PR reviews), and it is hit and miss so agree with the enhanced coding rather than just letting it run loose.
When the change is very small, self-contained feature/refactor it can mostly work alone, if you have tests that cover the feature then it is relatively safe (and you can do other stuff because it is running in an action, which is a big plus...write the issue and you are done, sometimes I have had Claude write the issue too).
When it gets to a more medium size, it will often produce something that will appear to work but actually doesn't. Maybe I don't have test coverage and it is my fault but it will do this the majority of the time. I have tried writing the issue myself, adding more info to claude.md, letting claude write the issue so it is a language it understands but nothing works, and it is quite frustrating because you spend time on the review and then see something wrong.
And anything bigger, unsurprisingly, it doesn't do well.
PR reviews are good for small/medium tasks too. Bar is lower here though, much is useless but it does catch things I have missed.
So, imo, still quite a way from being able to do things independently. For small tasks, I just get Claude to write the issue, and wait for the PR...that is great. For medium (which is most tasks), I don't need to do much actual coding, just directing Claude...but that means my productivity is still way up.
I did try Gemini but I found that when you let it off the leash and accept all edits, it would go wild. We have Copilot at work reviewing PRs, and it isn't so great. Maybe Gemini better on large codebases where, I assume, Claude will struggle.
What is the overall feedback loop with LLMs writing code? Do they learn as they go like we do? Do they just learn from reading code on GitHub? If the latter, what happens as less and less code gets written by human experts? Do the LLMs then stagnate in their progress and start to degrade? Kind of like making analog copies of analog copies of analog copies?
Code and math are similar to chess/go, where verification is (reasonably) easy so you can generate your own high-quality training data. It's not super straightforward, but you should still expect more progress in coming years.
Unlike OP, from my still limited but intense month or so diving into this topic so far, I had better luck with Gemini 2.5 PRO and Opus 4 on more abstract level like architecture etc. and then dealing input to Sonnet for coding. I found 2.5 PRO, and to a lesser degree Opus, were hit or miss; A lot of instances of them circling around the issue and correcting itself when coding (Gemini especially so), whereas Sonnet would cut to the chase, but needed explicit take on it to be efficient.
Totally possible. In general I believe that while more powerful in their best outputs, Sonnet/Opus 4 are in other ways (alignment / consistency) a regression on Sonnet 3.5v2 (often called Sonnet 3.6), as Sonnet 3.7 was. Also models are complex objects, and sometimes in a given domain a given model that on paper is weaker will work better. And, on top of that: interactive use vs agent requires different reinforcement learning training that sometimes may not be towards an aligned target... So also using the model in one way or the other may change how good it is.
This is my experience too. I usually use Gemini 2.5 Pro through AI Studio for big design ideas that need to be validated and refined. Then take the refined requirements to Claude Code which does an excellent job most of the time in coding them properly. Recently I tried Gemini CLI, and it's not even close to Claude Code's sharp coding skills. It often makes syntax mistakes, and get stuck trying to get itself out of a rut; its output is so verbose (and fast) that it's hard to follow what it's trying to do. Claude Code has a much better debugging capability.
Another contender in the "big idea" reasoning camp: DeepSeek R1. It's much slower, but most of the time it can analyze problems and get to the correct solution in one shot.
I have found that if I ask the LLM to first _describe_ to me what it wants to do without writing any code, then the subsequent code generated has much higher quality. I will ask for a detailed description of the things it wants to do, give it some feedback and after a couple of iterations, tell it to go ahead and implement it.
“Always be part of the loop by moving code by hand from your terminal to the LLM web interface: this guarantees that you follow every process. You are still the coder, but augmented.”
I agree with this, but this is why I use a CLI. You can pipe files instead of copying and pasting.
Yeah it is also a bit of a shibboleth: vibes coding, when I'm productive for the 80% case with Claude code, is about the LLM cranking for 10-20min. I'm instructing & automating the LLM on how to do its own context management, vs artisanally making every little decision.
Ex: Implementing a spec, responding to my review comments, adding wider unit tests, running a role play for usability testing, etc. The main time we do what he describes of manually copying into a web ide is occasionally for a better short use of a model, like only at the beginning of some plan generation, or debug from a bunch of context we have done manually. Like we recently solved some nasty GPU code race this way, using a careful mix of logs and distributed code. Most of our job is using Boring Tools to write Boring Code, even if the topic/area is neato: you do not want your codebase to work like an adventure for everything, so we invest in making it look boring.
I agree the other commenter said: I manage context as part of the skill, but by making the AI do it. Doing that by hand is like slowly handcoding assembly. Instead, I'm telling Claude Code to do it. Ex: Download and crawl some new dependency I'm using for some tricky topic, or read in my prompt template markdown for some task, or generate and self-maintain some plan.md with high-level rules on context I defined. This is the 80% case.
Maybe one of the disconnects is task latency vs throughput as trade-offs in human attention. If I need the LLM to get to the right answer faster, so the task is done faster, I have to lean in more. But my time is valuable and I have a lot to do. If rather spend 50% less of my time per task, even if the task takes 4x longer, by the LLM spinning longer. In that saved human time, I can be working on another task: I typically have 2-3 terminals running Claude, so I only check in every 5-15min.
Your strategy only works for some domains.
Totally
We do this ~daily for:
* Multitier webapps
* DevOps infrastructure: docker, aws, ci systems, shell scripts, ...
* Analytics & data processing
* AI investigations (logs, SIEMs, ..) <--- what we sell!
* GPU kernels
* Compilers
* Docs
* Test amplification
* Spec writing
I think ~half the code happening by professional software engineers fits into these, or other vibes friendly domains. The stuff antirez does with databases seems close to what we do with compilers, GPU kernels, and infra.
We are still not happy with production-grade frontend side of coding, though by being strong on API-first design and keeping logic vs UI seperated, most of our UI code is friendly to headless.
I currently use LLMs as a glorified Stack Overflow. If I want to start integrating an LLM like Gemini 2.5 PRO into my IDE (I use Visual Studio Code), whats the best way to do this? I don't want to use a platform like Cursor or Claude Code which takes me away from my IDE.
Cursor is an IDE. You can use its powerful (but occasionally wrong) autocomplete, and start asking it to do small coding tasks using the Ctrl+L side window.
I don't want to leave my IDE
I don't either but unfortunately Cursor is better than all the other plugins for IDEs like JetBrains. I just tab over to cursor and prompt it, then edit the code in my IDE of choice.
Does running a Claude Code command in VSCode's integrated terminal count as leaving your IDE?
(We may have differing definitions of "leaving" ones IDE).
Worth noting that Cursor is a VS Code fork and you can copy all of your settings over to it. Not saying that you have to, of course, but that it's perhaps not as different as you might be imagining.
GitHub Copilot is pretty easy to try within VS Code
I want to use Gemini 2.5 PRO. I was an early tester of Copilot and it was awful.
https://docs.github.com/en/copilot/reference/ai-models/suppo...
Thank you! When I was testing out Copilot I was stuck with whatever default LLM was being used. Didn't realize you could switch it out for a non-MS/OpenAI model.
Copilot has 2.5 Pro in the settings in github.com, along with claude 4
My question on all of the “can’t work with big codebases” is how would a codebase that was designed for an LLM look like? Composed of many many small functions that can be composed together?
I believe it’s the same as for humans: different files implementing different parts of the system with good interfaces and sensible boundaries.
Well documented helps a lot too.
You can use an LLM to help document a codebase, but it's still an arduous task because you do need to review and fix up the generated docs. It will make, sometimes glaring sometimes subtle, mistakes. And you want your documentation to provide accuracy rather than double down on or even introduce misunderstanding.
this is a common pattern I see -- if your codebase is confusing for LLMs, it's probably confusing for people too
This fact is one of the most pleasant surprises I’ve had during this AI wave. Finally, a concrete reason to care about your docs and your code quality.
And on top of that - can you steer an LLM to create this kind of code? In my experience the models don’t really have a „taste” for detecting complexity creep and reengineering for simplicity, in the same way an experienced human does.
I am vibe coding a complex app. You can certainly keep things clean but the trick is to enforce a rigid structure. This does add a veneer of complexity but simplifies " implement this new module" or "add this feature across all relevant files".
I found that it is beneficial to create more libraries. If I for example build a large integration to an API (basically a whole api client) I would in the past have it in the same repo but now I make it a standalone library.
And my question to that is how would that be different from a codebase designed for humans?
I think it means finer toplevel granularity re: what's runnable/testable at a given moment. I've been exploring this for my own projects and although it's not a silver bullet, I think there's something to it.
----
Several codebases I've known have provided a three-stage pipeline: unit tests, integration tests, and e2e tests. Each of these batches of tests depend on the creation of one of three environments, and the code being tested is what ends up in those environments. If you're interested in a particular failing test, you can use the associated environment and just iterate on the failing test.
For humans with a bit of tribal knowledge about the project, humans who have already solved the get-my-dev-environment-set-up problem in more or less uniform way, this works ok. Humans are better at retaining context over weeks and months, whereas you have to spin up a new session with an LLM every few hours or so. So we've created environments for ourselves that we ignore most of the time, but that are too complex to be bite sized for an agent that comes on the scene as a blank slate every few hours. There are too few steps from blank-slate to production, and each of them is too large.
But if successively more complex environments can be built on each other in arbitrarily many steps, then we could achieve finer granularity. As a nix user, my mental model for this is function composition where the inputs and outputs are environments, but an analogous model would be layers in a docker files where you test each layer before building the one on top of it.
Instead of maybe three steps, there are eight or ten. The goal would be to have both whatever code builds the environment, and whatever code tests it, paired up into bite-sized chunks so that a failure in the pipeline points you a specific stage which is more specific that "the unit tests are failing". Ideally test coverage and implementation complexity get distributed uniformly across those stages.
Keeping the scope of the stages small maximizes the amount of your codebase that the LLM can ignore while it works. I have a flake output and nix devshell corresponding to each stage in the pipeline and I'm using pytest to mark tests based on which stage they should run in. So I run the agent from the devshell that corresponds with whichever stage is relevant at the moment, and I introduce it to onlythe tests and code that are relevant to that stage (the assumption being that all previous stages are known to be in good shape). Most of the time, it doesn't need to know that it's working stage 5 of 9, so it "feels" like a smaller codebase than it actually is.
If evidence emerges that I've engaged the LLM at the wrong stage, I abandon the session and start over at the right level (now 6 of 9 or somesuch).
like a microservice architecture? overall architecture to get the context and then dive into a micro one?
Thanks for writing this article.
I used a similar setup until a few weeks ago, but coding agents became good enough recently.
I don’t find context management and copy pasting fun, I will let GitHub Copilot Insiders or Claude Code do it. I’m still very much in the loop while doing vibe coding.
Of course it depends on the code base, and Redis may not benefit much from coding agents.
But I don’t think one should reject vibe coding at this stage, it can be useful when you know what the LLMs are doing.
I find agentic coding to be best when using one branch per conversation. Even if that conversation is only a single bugfix, branch it. Then do 2 or 3 iterations of that same conversation across multiple branches and choose the best result of the 3 and destroy the other two.
I’m super curious to see the reactions in the comments.
antirez is a big fuggin deal on HN.
I’m sort of curious if the AI doubting set will show up in force or not.
Lovely post @antirez. I like the idea that LLMs should be directly accessing my codebase and there should be no agents in between. Basically no software that filters what the LLM sees.
That said, are there tools that make going through a codebase easier for LLMs? I guess tools like Claude Code simply grep through the codebase and find out what Claude needs. Is that good enough or are there tools which keep a much more thorough view of the codebase?
Sorry if I missed it in the article — what’s your setup? Do you use a CLI tool like aider or are you using an IDE like cursor?
Terminal with vim in one side, the official web interface of the model in the other side. The pbcopy utility to pass stuff in the clipboard. I believe models should be used in their native interface as when there are other layers sometimes the model served is not exactly the same, other times it misbehaves because of RAG and in general no exact control of the context window.
This seems like a lot of work depending upon the use case. e.g. the other day I had a bunch of JSON files with contact info. I needed to update them with more recent contact info on an internal Confluence page. I exported the Confluence page to a PDF, then dropped it into the same directory as the JSON files. I told Claude Code to read the PDF and use it to update the JSON files.
It tried a few ways to read the PDF before coming up with installing PyPDF2, using that to parse the PDF, then updated all the JSON files. It took about 5 minutes to do this, but it ended up 100% correct, updating 7 different fields across two dozen JSON files.
(The reason for the PDF export was to get past the Confluence page being behind Okta authentication. In retrospect, I probably should've saved the HTML and/or let Claude Code figure out how to grab the page itself.)
How would I have done that with Gemini using just the web interface?
He uses vim and copy paste code from web interfaces because he wants to maintain control and understanding of the code. You can find proofs of this setup on his youtube channel [https://www.youtube.com/@antirez]
Thanks. Also based on the coding rig you use models may not match the performance of what it is served via web. Or may not be as cheap. For instance the Gemini 2.5 pro 20$ account is very hard to saturate with queries.
Can anyone recommend a workflow / tools that accomplishes a slightly more augmented version of antirez’ workflow & suggestions minus the copy-pasting?
I am on board to agree that pure LLM + pure original full code as context is the best path at the moment, but I’d love to be able to use some shortcuts like quickly applying changes, checkpoints, etc.
My persistent (and not unfounded?) worry is that all the major tools & plugins (Cursor, Cline/Roo) all play games with their own sub-prompts and context “efficiency”.
What’s the purest solution?
You can actually just put Cursor in manual mode and it's the same thing. You 100% manage the context and there's no agentic loop.
If your codebase fits in the context window, you can also just turn on "MAX" mode and it puts it all in the context for you.
Claude Code has worked well for me. It is easy to point it to the relevant parts of the codebase and see what it decides to read itself so you provide missing piece of code when necessary.
This is almost the opposite of what OP is asking, and what the post from antirez describes.
Since I’ve heard Gemini-cli is not yet up to snuff, has anyone tried opencode+gemini? I’ve heard that with opencode you can login with Google account (have NOT confirmed this, but if anyone has any experience, pls advise) so not sure if that would get extra mileage from Gemini’s limits vs using a Gemini api key?
This matches my take, but I'm curious if OP has used Claude code.
Yep when I use agents I go for Claude Code. For example I needed to buy too many Commodore 64 than appropriate lately, and I let it code a Telegram bot advising me when popular sources would have interesting listings. It worked (after a few iterations) then I looked at the code base and wanted to puke but who cares in this case? It worked and it was much faster and I had zero to learn in the proces of doing it myself. I published a Telegram library for C in the past and know how it works and how to do scraping and so forth.
For example I needed to buy too many Commodore 64 than appropriate lately
Been there, done that!
for those one-off small things, LLMs are rather cool. Especially Cloude Code and Gemini CLI. I was given an archive of some really old movies recently, but files were bearing title names in Croatian instead of original (mostly English ones). So I claude --dangerously-skip-permissions into the directory with movies and in a two-sentence prompt I asked it to rename files into a given format (that I tend to have in my archive) and for each title to find original name and year or release and use it in the file.. but, before commiting rename to give me a list of before and after for approval. It took like what, a minute of writing a prompt.
Now, for larger things, I'm still exploring a way, an angle, what and how to do it. I've tried from yolo prompting to structured and uber structured approaches, all the way to mimicking product/prd - architecture - project management / tasks - developer/agents.. so far, unless it's rather simpler projects I don't see it's happening that way. Most luck I had was "some structure" as context and inputs and then guiding prompting during sessions and reviewing stuff. Almost pair-programming.
> ## Provide large context
I thought large contexts are not necessarily better and sometimes have opposite effect ?
LLMs performance will suffer from both insufficient context and context flooding. Balancing is an art.
I found it depends very much on the task. For "architect" sessions you need as much context as you can reasonably gather. The more the merrier. At least gemini2.5 pro will gather the needed context from many files and it really does make a difference when you can give it a lot of it.
On coding you need to aggressively prune it, and only give minimum adjacent context, or it'll start going on useless tangents. And if you get stuck just refresh and start from 0, changing what is included. It's often faster than "arguing" with the LLM in multi-step sessions.
(the above is for existing codebases. for vibe-coding one-off scripts, just go with the vibes, sometimes it works surprisingly well from a quick 2-3 lines prompt)
> Coding activities should be performed mostly with: Claude Opus 4
I've been going down to sonnet for coding over opus. maybe i am just writing dumb code
In my experience as well, Sonnet 4 is much better than Opus. Opus is great at the start of a project, where you would need to plan things, structure the project, figure out how to execute but it cannot beat Sonnet is actually executing it. It is also a lot cheaper.
That is also what Anthropic recommends. In edge cases use Opus.
Opus is also way more expensive. (Don’t forget to switch back to Sonnet in all terminals)
Most of the time Sonnet 4 just works but need to refine context as much as you can.
OP I think gemini 2.5 pro is in the hospital and has been recovering for the last 2 weeks, lets all wish our good friend a good recovery and hope they can get back to their normal selves,