You’ve been given evidence that people cannot trust their own perceptions of what these agents do, and you replied by telling a bunch of stories about why you think you personally can trust your perceptions. My 12-year-old did the same thing when I tried to explain this to them.
You asked for data. I (probably) can’t give you the data, so I gave you what I could: a few things gleaned from both objective data (collected from a significant number of engineers) and my own anecdotal experience. You are free to disregard it, and I wouldn’t even blame you. There are lots of fools on the internet, and there’s a decent chance that I’m just another one 🙂.
Engineers being spread thinner to manage a wider number of tasks whilst reviewing shitty LLM noise that they didn’t write is inevitably going to make horrible code that’s impossible to maintain and will cost massive amounts of time and resources in the long run.
This was true a year ago. Even like seven months ago.
Hell, even three months ago, I would have agreed with you a LOT more than I do today – mostly because I was just forced learn these things more in-depth quite recently. “Shitty LLM noise” is a very early part of the learning curve. In a way, it’s similar to “Hello world.” Discard it and figure out how get more useful results.
In many companies that have adopted AI, engineers are still responsible for their code. Any slop in the codebase is the fault of the engineer that introduced it (and the engineer[s] that reviewed it), regardless of whether it’s hand-written or generated. So far, I have not seen anyone merge unmaintainable, “shitty LLM noise” into enterprise codebases – that would be very risky. (It probably happens in other places like Microsoft, I just haven’t seen it myself. It would be unacceptable.)
Anyway, you’ll see all this eventually, when some data gets published. I’d gain nothing by convincing anyone of this, so I won’t try 🙂.
This is just a statement of faith in your ability to judge these things accurately. Nowhere in here do I see any evidence that you’ve even considered that the reason you’ve changed your attitude towards the tech is that it’s just gotten so good at fooling people that it’s finally got you.
You don’t gain much from trying to convince me, but you could gain a lot from being more sceptical. People invented science to address the fact that our intuitive understanding doesn’t always reflect reality.
Science and the collection of objective data stops us from doing this:
There are a bunch of things that our brains just don’t understand intuitively, so we need to check our intuition against measurement. There’s no shame in that, but when it’s pointed out, then you have a chance to check yourself.
But you don’t seem to understand that. When you say:
Anyway, you’ll see all this eventually, when some data gets published.
you are demonstrating that you are the perfect mark for this stuff, because you are not reflecting on your own thought process to see where it might be failing you.
This is just a statement of faith in your ability to judge these things accurately. Nowhere in here do I see any evidence that you’ve even considered that the reason you’ve changed your attitude towards the tech is that it’s just gotten so good at fooling people that it’s finally got you.
Yet in all of your replies, you seem to have assumed early on that I’ve been fooled, based on outdated data. Do you just assume that newer data just doesn’t exist anywhere, and I’m lying about it? (To be clear: I wouldn’t blame you. There’s an old proverb: “Believe nothing you hear, and only half of what you see,” or something like that.)
you could gain a lot from being more sceptical
Another assumption that I wasn’t skeptical.
Anyway, the rest of your reply continues with the assumption that there was no data or objectivity on my part, so I won’t keep beating a dead horse. Just wait for newer data. It might be old by the time you see it, but still useful.
Edit: I suppose the number of recent layoffs might be useful (or at least interesting) data. Suddenly many different, unrelated companies had too many engineers – quite a contrast to the engineer shortage just a few years ago. Correlation ≠ causation and all, but interesting nonetheless.
And even for complex coding projects like the ones studied, the researchers are also optimistic that further refinement of AI tools could lead to future efficiency gains for programmers. Systems that have better reliability, lower latency, or more relevant outputs (via techniques such as prompt scaffolding or fine-tuning) “could speed up developers in our setting,” the researchers write. Already, they say there is “preliminary evidence” that the recent release of Claude 3.7 “can often correctly implement the core functionality of issues on several repositories that are included in our study.”
Claude 3.7 was released in February 2025. Also, I highly doubt 3.7 was good enough to make engineers more productive, overall (though I don’t have data on those old models). Relative to the speed of evolution of LLMs, harnesses, and people’s skills in using them, the data behind this article is ancient.
Edit 3:
In that article you shared, they link to the study in the second paragraph. Follow that link, and you’ll see this at the top:
Update: In February 2026, we published new data on the productivity impact of late-2025 AI tools.
There were selection effects in the follow-up study, but seemed worth mentioning anyway.
Anyway, you’ll see all this eventually, when some data gets published.
That is not a skeptical position.
And my point is that given that the data shows objectively that it does fool people - even subject matter experts - it is reasonable to believe that effect continues until proven otherwise. We know it’s a feature of LLMs, and the fact you continue to push on with your blind faith undisturbed by this knowledge is truly alarming.
In the follow up study, first of all I want to point out that it’s not definitive that it made them faster, the error bars include regions where they were slowed down. And none of this includes the long-term effects of poorly made, unmaintainable code that was farted out in bulk by an overworked engineer who didn’t have time to properly review code that they didn’t write and don’t fully understand.
It also doesn’t include the effects of long term exposure to LLMs reducing their solo effectiveness. If you only measure the immediate delta, then it could look like the LLM is helping when actually it’s just making people dependent.
And the selected dev quotes are also alarming in light of that information:
“I’m torn. I’d like to help provide updated data on this question but also I really like using AI!” — a developer from the original study early-2025 when asked to participate in the late-2025 study.
“I found I am actually heavily biased sampling the issues … I avoid issues like AI can finish things in just 2 hours, but I have to spend 20 hours. I will feel so painful if the task is decided as AI-disallowed.” — a developer from the new study noting selection effects when choosing what tasks to include in the study.
“my head’s going to explode if I try to do too much the old fashioned way because it’s like trying to get across the city walking when all of a sudden I was more used to taking an Uber.” — a developer from the new study noting selection effects when choosing what tasks to include in the study.
These quotes don’t demonstrate that LLMs actually help, only that they are addictive, which we already know to be true. If you’ve ever tried to talk to an addict about their problem you’d recognise this language.
Especially the quote that they could do something in 2 hours with an LLM that would take 20 hours alone. That can’t be true, that person is definitely wrong about the effect of the LLM. If it were really that effective, LLM companies would be clamouring to show the data that proves how effective their products are. Why aren’t they?
The fact this data is so hard to find and so hard to fund when there are so many billions being dumped into this field should tell you something, it should be deeply disturbing, but you just carry on fully convinced that you’re right and that there’s nothing to what I’m saying, even though you admitted you would’ve agreed just 3 months ago. Again, if you can actually show that the difference is so dramatic, then show it. You’re not though. You’re just convinced that you don’t need to re-evaluate what you believe. That doesn’t say good things about where your head’s at.
If you truly weren’t trying to convince me, you could just stop. I don’t know what you’re trying to prove by continuing.
Anyway, you’ll see all this eventually, when some data gets published.
That is not a skeptical position.
You can’t possibly know what was in my head while going over internal data. But at this point, that horse is so beaten and so dead that it’s starting to stink.
And my point is that given that the data shows objectively that it does fool people - even subject matter experts - it is reasonable to believe that effect continues until proven otherwise. We know it’s a feature of LLMs, and the fact you continue to push on with your blind faith undisturbed by this knowledge is truly alarming.
In the follow up study, first of all I want to point out that it’s not definitive that it made them faster, the error bars include regions where they were slowed down. And none of this includes the long-term effects of poorly made, unmaintainable code that was farted out in bulk by an overworked engineer who didn’t have time to properly review code that they didn’t write and don’t fully understand.
I’m just gonna zoom in on this part for a sec:
unmaintainable code that was farted out in bulk by an overworked engineer who didn’t have time to properly review code that they didn’t write and don’t fully understand.
I fully agree that all of this is bad.
Don’t create or approve unmaintainable code. Avoid overworking engineers (this is more of a management problem). Make time to properly review and understand the code, request changes, even reject patches/PRs/MRs if it’s hot garbage, etc. The practices predate LLMs and should still be upheld today, regardless what which tools are used. I feel like I’ve already covered these things already though. These poor dead horses 🙁
Onward…
It also doesn’t include the effects of long term exposure to LLMs reducing their solo effectiveness. If you only measure the immediate delta, then it could look like the LLM is helping when actually it’s just making people dependent.
And the selected dev quotes are also alarming in light of that information:
“I’m torn. I’d like to help provide updated data on this question but also I really like using AI!” — a developer from the original study early-2025 when asked to participate in the late-2025 study.
“I found I am actually heavily biased sampling the issues … I avoid issues like AI can finish things in just 2 hours, but I have to spend 20 hours. I will feel so painful if the task is decided as AI-disallowed.” — a developer from the new study noting selection effects when choosing what tasks to include in the study.
“my head’s going to explode if I try to do too much the old fashioned way because it’s like trying to get across the city walking when all of a sudden I was more used to taking an Uber.” — a developer from the new study noting selection effects when choosing what tasks to include in the study.
These quotes don’t demonstrate that LLMs actually help, only that they are addictive, which we already know to be true. If you’ve ever tried to talk to an addict about their problem you’d recognise this language.
Yep. I did mention that there were selection effects. (Also, I’ve talked to many hardcore drug and alcohol addicts. They’re very different 😆)
Especially the quote that they could do something in 2 hours with an LLM that would take 20 hours alone. That can’t be true, that person is definitely wrong about the effect of the LLM. If it were really that effective, LLM companies would be clamouring to show the data that proves how effective their products are. Why aren’t they?
*sigh* Alright, I guess we can zoom in on this one too…
Especially the quote that they could do something in 2 hours with an LLM that would take 20 hours alone. That can’t be true, that person is definitely wrong about the effect of the LLM.
This seems pretty speculative. Do you know what their task was? Do you know what the quality of the result was?
If it were really that effective, LLM companies would be clamouring to show the data that proves how effective their products are. Why aren’t they?
Ok, there are some more assumptions here…
companies would be clamouring to show the data that proves how effective their products are.
You assume they’re not?
Why aren’t they?
You assume that the only possible reason is because they can’t? Could there be no other possible reasons?
Here’s one possible explanation that comes to mind: There have been deals made between rival AI companies because more compute capacity simply does not exist yet. I doubt Anthropic was super happy to buy compute from xAI, but they’ve been continuously pissing off their non-enterprise customers by tightening usage limits, doing dumb things with their APIs to detect and bill different use cases (e.g., OpenClaw) at different rates, etc. They have to honor SLAs for their enterprise customers, so they’ve been throttling their less-profitable customers while scrambling to secure more compute power.
Onward…
The fact this data is so hard to find and so hard to fund when there are so many billions being dumped into this field should tell you something, it should be deeply disturbing, but you just carry on fully convinced that you’re right and that there’s nothing to what I’m saying, even though you admitted you would’ve agreed just 3 months ago. Again, if you can actually show that the difference is so dramatic, then show it. You’re not though.
Yeah I think I’ve made it pretty clear why. No need to bruise more horse carcasses.
Anyway, my apologies for splitting this paragraph. You finished this one out with another assumption:
You’re just convinced that you don’t need to re-evaluate what you believe. That doesn’t say good things about where your head’s at.
And then wrapping up with more assumption:
If you truly weren’t trying to convince me, you could just stop. I don’t know what you’re trying to prove by continuing.
You assume that I’m trying to convince you of something? And that the only possible reason for me engaging with you is that I want to prove something?
I promise, I can’t sell you AI. Like, even if I had some crazy sales skills, I still physically couldn’t fulfill that sale. I only have one GPU in my home server, and it’s not even enough for myself. I have nothing to gain here.
Anyway, I think it might be more productive to just put this thread on ice. It’s starting to get a bit repetitive/circular at this point. Could be interesting to revisit this in a year though, to see how things have progressed 🙂
This is just a bunch of questions, but no answers, except for that whole paragraph about compute capacity whose point I simply cannot penetrate.
Just admit you don’t know that it makes people faster, you’re just dancing around that issue not saying anything.
EDIT:
Also:
Anyway, you’ll see all this eventually, when some data gets published.
That is not a skeptical position.
You can’t possibly know what was in my head while going over internal data.
Motherfucker, you told me what you were thinking and I took that at face value. You want to pretend that there’s some super-secret personal trove of knowledge that I can’t access that tells you the truth about what future data will say? Cool, I call that blind faith. I don’t know how you can pretend it’s anything else.
You asked for data. I (probably) can’t give you the data, so I gave you what I could: a few things gleaned from both objective data (collected from a significant number of engineers) and my own anecdotal experience. You are free to disregard it, and I wouldn’t even blame you. There are lots of fools on the internet, and there’s a decent chance that I’m just another one 🙂.
This was true a year ago. Even like seven months ago. Hell, even three months ago, I would have agreed with you a LOT more than I do today – mostly because I was just forced learn these things more in-depth quite recently. “Shitty LLM noise” is a very early part of the learning curve. In a way, it’s similar to “Hello world.” Discard it and figure out how get more useful results.
In many companies that have adopted AI, engineers are still responsible for their code. Any slop in the codebase is the fault of the engineer that introduced it (and the engineer[s] that reviewed it), regardless of whether it’s hand-written or generated. So far, I have not seen anyone merge unmaintainable, “shitty LLM noise” into enterprise codebases – that would be very risky. (It probably happens in other places like Microsoft, I just haven’t seen it myself. It would be unacceptable.)
Anyway, you’ll see all this eventually, when some data gets published. I’d gain nothing by convincing anyone of this, so I won’t try 🙂.
This is just a statement of faith in your ability to judge these things accurately. Nowhere in here do I see any evidence that you’ve even considered that the reason you’ve changed your attitude towards the tech is that it’s just gotten so good at fooling people that it’s finally got you.
You don’t gain much from trying to convince me, but you could gain a lot from being more sceptical. People invented science to address the fact that our intuitive understanding doesn’t always reflect reality.
Science and the collection of objective data stops us from doing this:
There are a bunch of things that our brains just don’t understand intuitively, so we need to check our intuition against measurement. There’s no shame in that, but when it’s pointed out, then you have a chance to check yourself.
But you don’t seem to understand that. When you say:
you are demonstrating that you are the perfect mark for this stuff, because you are not reflecting on your own thought process to see where it might be failing you.
Yet in all of your replies, you seem to have assumed early on that I’ve been fooled, based on outdated data. Do you just assume that newer data just doesn’t exist anywhere, and I’m lying about it? (To be clear: I wouldn’t blame you. There’s an old proverb: “Believe nothing you hear, and only half of what you see,” or something like that.)
Another assumption that I wasn’t skeptical.
Anyway, the rest of your reply continues with the assumption that there was no data or objectivity on my part, so I won’t keep beating a dead horse. Just wait for newer data. It might be old by the time you see it, but still useful.
Edit: I suppose the number of recent layoffs might be useful (or at least interesting) data. Suddenly many different, unrelated companies had too many engineers – quite a contrast to the engineer shortage just a few years ago. Correlation ≠ causation and all, but interesting nonetheless.
Edit 2: I just noticed this paragraph in that link you shared:
Claude 3.7 was released in February 2025. Also, I highly doubt 3.7 was good enough to make engineers more productive, overall (though I don’t have data on those old models). Relative to the speed of evolution of LLMs, harnesses, and people’s skills in using them, the data behind this article is ancient.
Edit 3:
In that article you shared, they link to the study in the second paragraph. Follow that link, and you’ll see this at the top:
There were selection effects in the follow-up study, but seemed worth mentioning anyway.
It wasn’t an assumption:
That is not a skeptical position.
And my point is that given that the data shows objectively that it does fool people - even subject matter experts - it is reasonable to believe that effect continues until proven otherwise. We know it’s a feature of LLMs, and the fact you continue to push on with your blind faith undisturbed by this knowledge is truly alarming.
In the follow up study, first of all I want to point out that it’s not definitive that it made them faster, the error bars include regions where they were slowed down. And none of this includes the long-term effects of poorly made, unmaintainable code that was farted out in bulk by an overworked engineer who didn’t have time to properly review code that they didn’t write and don’t fully understand.
It also doesn’t include the effects of long term exposure to LLMs reducing their solo effectiveness. If you only measure the immediate delta, then it could look like the LLM is helping when actually it’s just making people dependent.
And the selected dev quotes are also alarming in light of that information:
These quotes don’t demonstrate that LLMs actually help, only that they are addictive, which we already know to be true. If you’ve ever tried to talk to an addict about their problem you’d recognise this language.
Especially the quote that they could do something in 2 hours with an LLM that would take 20 hours alone. That can’t be true, that person is definitely wrong about the effect of the LLM. If it were really that effective, LLM companies would be clamouring to show the data that proves how effective their products are. Why aren’t they?
The fact this data is so hard to find and so hard to fund when there are so many billions being dumped into this field should tell you something, it should be deeply disturbing, but you just carry on fully convinced that you’re right and that there’s nothing to what I’m saying, even though you admitted you would’ve agreed just 3 months ago. Again, if you can actually show that the difference is so dramatic, then show it. You’re not though. You’re just convinced that you don’t need to re-evaluate what you believe. That doesn’t say good things about where your head’s at.
If you truly weren’t trying to convince me, you could just stop. I don’t know what you’re trying to prove by continuing.
You can’t possibly know what was in my head while going over internal data. But at this point, that horse is so beaten and so dead that it’s starting to stink.
I’m just gonna zoom in on this part for a sec:
I fully agree that all of this is bad.
Don’t create or approve unmaintainable code. Avoid overworking engineers (this is more of a management problem). Make time to properly review and understand the code, request changes, even reject patches/PRs/MRs if it’s hot garbage, etc. The practices predate LLMs and should still be upheld today, regardless what which tools are used. I feel like I’ve already covered these things already though. These poor dead horses 🙁
Onward…
Yep. I did mention that there were selection effects. (Also, I’ve talked to many hardcore drug and alcohol addicts. They’re very different 😆)
*sigh* Alright, I guess we can zoom in on this one too…
This seems pretty speculative. Do you know what their task was? Do you know what the quality of the result was?
Ok, there are some more assumptions here…
You assume they’re not?
You assume that the only possible reason is because they can’t? Could there be no other possible reasons?
Here’s one possible explanation that comes to mind: There have been deals made between rival AI companies because more compute capacity simply does not exist yet. I doubt Anthropic was super happy to buy compute from xAI, but they’ve been continuously pissing off their non-enterprise customers by tightening usage limits, doing dumb things with their APIs to detect and bill different use cases (e.g., OpenClaw) at different rates, etc. They have to honor SLAs for their enterprise customers, so they’ve been throttling their less-profitable customers while scrambling to secure more compute power.
Onward…
Yeah I think I’ve made it pretty clear why. No need to bruise more horse carcasses.
Anyway, my apologies for splitting this paragraph. You finished this one out with another assumption:
And then wrapping up with more assumption:
You assume that I’m trying to convince you of something? And that the only possible reason for me engaging with you is that I want to prove something?
I promise, I can’t sell you AI. Like, even if I had some crazy sales skills, I still physically couldn’t fulfill that sale. I only have one GPU in my home server, and it’s not even enough for myself. I have nothing to gain here.
Anyway, I think it might be more productive to just put this thread on ice. It’s starting to get a bit repetitive/circular at this point. Could be interesting to revisit this in a year though, to see how things have progressed 🙂
This is just a bunch of questions, but no answers, except for that whole paragraph about compute capacity whose point I simply cannot penetrate.
Just admit you don’t know that it makes people faster, you’re just dancing around that issue not saying anything.
EDIT:
Also:
Motherfucker, you told me what you were thinking and I took that at face value. You want to pretend that there’s some super-secret personal trove of knowledge that I can’t access that tells you the truth about what future data will say? Cool, I call that blind faith. I don’t know how you can pretend it’s anything else.