Of course it did. LLMs are terrible at any real tasks that involve actual reasoning and are not doable by a stochastic natural-sounding text extruding machine. Check out this study by a bunch of Apple engineers that pointed out this exact same thing: https://machinelearning.apple.com/research/illusion-of-thinking

[–] 7bicycles@hexbear.net 17 points 1 month ago (6 children)

I get how the LLM is bad at chess, I think most of everyone games of chess suck ass by definition but I'm kind of baffled about how it apparently not only played badly but wrong. How is there a big enough dataset of people yucking it up for that to happen entirely consistently?

[–] Zuzak@hexbear.net 9 points 1 month ago

If I say, "Knight to B4," does that sound like something a person playing chess might say? Then it did it's job.

Think of an LLM as an actor. You don't hire someone to act as a grandmaster in a movie based on their skill at chess, they might not even know how to play, but if they deliver the lines in a convincing way, that's what you're looking for. There's chess AIs that are incredibly good at chess, because that's what they're designed for and trained on. That's why this is a very silly test, it's like testing a fish on its tree-climbing ability, the only thing sillier than this test is that people are surprised by it.

load more comments (5 replies)

load more comments (9 replies)