this post was submitted on 17 Jun 2025
96 points (100.0% liked)
TechTakes
1973 readers
245 users here now
Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.
This is not debate club. Unless it’s amusing debate.
For actually-good tech, you want our NotAwfulTech community
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
One of the big AI companies (Anthropic with claude? Yep!) wrote a long paper that details some common LLM issues, and they get into why they do math wrong and lie about it in "reasoning" mode.
It's actually pretty interesting, because you can't say they "don't know how to do math" exactly. The stochastic mechanisms that allow it to fool people with written prose also allow it to do approximate math. That's why some digits are correct, or it gets the order of magnitude right but still does the math wrong. It's actually layering together several levels of approximation.
The "reasoning" is just entirely made up. We barely understsnd how LLMs actually work, so none of them have been trained on research about that, which means LLMs don't understand their own functioning (not that they "understand" anything strictly speaking).
Thing is, it has tool integration. Half of the time it uses python to calculate it. If it uses a tool, that means it writes a string that isn't shown to the user, which runs the tool, and tool results are appended to the stream.
What is curious is that instead of request for precision causing it to use the tool (or just any request to do math), and then presence of the tool tokens causing it to claim that a tool was used, the requests for precision cause it to claim that a tool was used, directly.
Also, all of it is highly unnatural texts, so it is either coming from fine tuning or from training data contamination.
A tool uses an LLM, the LLM uses a tool. What a beautiful ouroboros.
I would be careful how you say this. Eliezer likes to go on about giant inscrutable matrices to fearmoner, and the promptfarmers use the (supposed) mysteriousness as another avenue for crithype.
It's true reverse engineering any specific output or task takes a lot of effort and requires access to the model's internals weights and hasn't been done for most tasks, but the techniques exist for doing so. And in general there is a good high level conceptual understanding of what makes LLMs work.
This part is absolutely true. If you catch them in mistake, most of their data about responding is from how humans respond, or, at best fine-tuning on other LLM output and they don't have any way of checking their own internals, so the words they say in response to mistakes is just more bs unrelated to anything.