It would have to be more than just river crossings, yeah.
Although I'm also dubious that their LLM is good enough for universal river crossing puzzle solving using a tool. It's not that simple, the constraints have to be translated into the format that the tool understands, and the answer translated back. I got told that o3 solves my river crossing variant but the chat log they gave had incorrect code being run and then a correct answer magically appearing, so I think it wasn't anything quite as general as that.
I think it gotten to the point where its about as helpful to point out it is just an autocomplete bot, as it is to point out that "its just the rotor blades chopping sunlight" when a helicopter pilot is impaired by flicker vertigo and is gonna crash. Or in the world of BLIT short story, that its just some ink on a wall.
Human nervous system is incredibly robust, comparing to software, or comparing to its counterpart in the fictional world in BLIT, or comparing to shrimps mesmerized by cuttlefish.
And yet it has exploitable failure modes, and a corporation that is optimizing an LLM for various KPIs is a malign intelligence that is searching for a way to hack brains, this time with much better automated tooling and with a very large budget. One may even say a super-intelligence since it is throwing the combined efforts of many at the problem.
edit: that is to say there certainly is something weird going on on psychological level ever since Eliza.
Yudkowsky is a dumbass layman posing as an expert, and he's playing up his own old pre-conceived bullshit. But if he can get some of his audience away from the danger - even if he attributes a good chunk of the malevolence to a dumb ass autocomplete to do so, that is not too terrible of a thing.