I'm kind of dubious its effective in any term whatsoever, unless the term is "nothing works but we got a lot of it".
diz
I think if people are citing in another 3 months time, they’ll be making a mistake
In 3 months they'll think they're 40% faster while being 38% slower. And sometime in 2026 they will be exactly 100% slower - the moment referred to as "technological singularity".
Yeah, the glorious future where every half-as-good-as-expert developer is now only 25% as good as an expert (a level of performance also known as being "completely shit at it"), but he's writing 10x the amount of unusable shitcode.
I think more low tier output would be a disaster.
Even pre AI I had to deal with a project where they shoved testing and compliance at juniors for a long time. What a fucking mess it was. I had to go through every commit mentioning Coverity because they had a junior fixing coverity flagged "issues". I spent at least 2 days debugging a memory corruption crash caused by such "fix", and then I had to spend who knows how long reviewing every such "fix".
And don't get me started on tests. 200+ tests, of them none caught several regressions in handling of parameters that are shown early in the frigging how-to. Not some obscure corner case, the stuff you immediately run into if you just follow the documentation.
With AI all the numbers would be much larger - more commits "fixing coverity issues" (and worse yet fixing "issues" that LLM sees in code), more so called "tests" that don't actually flag any real regressions, etc.
I suspect that the kind of people who would "know how to use it" don't use it right now since it has not yet reached "useful if you know how to use it" status.
Software work is dominated by the fat tail distribution of time it takes to figure out and fix a bug. Not by typing code. LLMs, much like any other form of cutting and pasting code without having any clue what it does, gives that distribution a longer, fatter tail, hence its detrimental effect on productivity.
And the other "nuanced" take, common on my linkedin feed, is that people who learn how to use (useless) AI are gonna replace everyone with their much increased productive output.
Even if AI becomes not so useless, the only people whose productivity will actually improve are the people who aren't using it now (because they correctly notice that its a waste of time).
That philosophy always ends in stepping into dogshit to try to boost stock prices.
When they tested on bugs not in SWE-Bench, the success rate dropped to 57‑71% on random items, and 50‑68% on fresh issues created after the benchmark snapshot. I’m surprised they did that well.
After the benchmark snapshot. Could still be before LLM training data cut off, or available via RAG.
edit: For a fair test you have to use git issues that had not been resolved yet by a human.
This is how these fuckers talk, all of the time. Also see Sam Altman's not-quite-denials of training on Scarlett Johansson's voice: they just asserted that they had hired a voice actor, but didn't deny training on actual Scarlett Johansson's voice. edit: because anyone with half a brain knows that not only did they train on her actual voice, they probably gave it and their other pirated movie soundtracks massively higher weighting, just as they did for books and NYT articles.
Anyhow, I fully expect that by now they just use everything they can to cheat benchmarks, up to and including RAG from solutions past the training dataset cut off date. With two of the paper authors being from Microsoft itself, expect that their "fresh issues" are gamed too.
Yeah I'm thinking that people who think their brains work like LLM may be somewhat correct. Still wrong in some ways as even their brains learn from several orders of magnitude less data than LLMs do, but close enough.
You can film with an actual camera then use video to video to make it look very AI. If you're just grifting, that would be the way to go I think.
They're also very gleeful about finally having one upped the experts with one weird trick.
Up until AI they were the people who were inept and late at adopting new technology, and now they get to feel that they're ahead (because this time the new half-assed technology was pushed onto them and they didn't figure out they needed to opt out).
Oh wow it is precisely the problem I "predicted" before: there are surprisingly few production grade implementations to plagiarize from.
Even for seemingly simple stuff. You might think parsing floating point numbers from strings would have a gazillion examples. But it is quite tricky to do it correctly (a correct implementation allows you to convert a floating point number to a string with enough digits, and back, and always obtain precisely the same number that you started with). So even for such omnipresent example, which has probably been implemented well over 10 000 times by various students, if you start pestering your bot with requests to make it better, if you have the bots write the tests and pass them, you could end up plagiarizing something identifiable.
edit: and even suppose there were 2, or 3, or 5 exfat implementations. They would be too different to "blur" together. The deniable plagiarism that they are trying to sell - "it learns the answer in general from many implementations, then writes original code" - is bullshit.