this post was submitted on 08 Jun 2025
85 points (100.0% liked)

TechTakes

1939 readers
130 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] Soyweiser@awful.systems 2 points 1 day ago (1 children)

Latter test fails if they write a specific bit of code to put out the 'llms fail the river crossing' fire btw. Still a good test.

[–] diz@awful.systems 2 points 5 hours ago

It would have to be more than just river crossings, yeah.

Although I'm also dubious that their LLM is good enough for universal river crossing puzzle solving using a tool. It's not that simple, the constraints have to be translated into the format that the tool understands, and the answer translated back. I got told that o3 solves my river crossing variant but the chat log they gave had incorrect code being run and then a correct answer magically appearing, so I think it wasn't anything quite as general as that.