Need to let loose a primal scream without collecting footnotes first? Have a sneer percolating in your system but not enough time/energy to make a whole post about it? Go forth and be mid: Welcome to the Stubsack, your first port of call for learning fresh Awful you’ll near-instantly regret.
Any awful.systems sub may be subsneered in this subthread, techtakes or no.
If your sneer seems higher quality than you thought, feel free to cut’n’paste it into its own post — there’s no quota for posting and the bar really isn’t that high.
The post Xitter web has spawned soo many “esoteric” right wing freaks, but there’s no appropriate sneer-space for them. I’m talking redscare-ish, reality challenged “culture critics” who write about everything but understand nothing. I’m talking about reply-guys who make the same 6 tweets about the same 3 subjects. They’re inescapable at this point, yet I don’t see them mocked (as much as they should be)
Like, there was one dude a while back who insisted that women couldn’t be surgeons because they didn’t believe in the moon or in stars? I think each and every one of these guys is uniquely fucked up and if I can’t escape them, I would love to sneer at them.
Last week's thread
(Semi-obligatory thanks to @dgerard for starting this)
Dude discovers that one LLM model is not entirely shit at chess, spends time and tokens proving that other models are actually also not shit at chess.
The irony? He's comparing it against Stockfish, a computer chess engine. Computers playing chess at a superhuman level is a solved problem. LLMs have now slightly approached that level.
Writeup https://dynomight.net/more-chess/
HN discussion https://news.ycombinator.com/item?id=42206817
I'm not a Chess person or familiar with Stockfish so take this with a grain of salt, but I found a few interesting things perusing the code / docs which I think makes useful context.
Skill Level
I assume "level" refers to Stockfish's Skill Level option.
If I mathed right, Stockfish roughly estimates Skill Level 1 to be around 1445 ELO (source). However it says "This Elo rating has been calibrated at a time control of 60s+0.6s" so it may be significantly lower here.
Skill Level affects the search depth (appears to use depth of 1 at Skill Level 1). It also enables MultiPV 4 to compute the four best principle variations and randomly pick from them (more randomly at lower skill levels).
Move Time & Hardware
This is all independent of move time. This author used a move time of 10 milliseconds (for stockfish, no mention on how much time the LLMs got). ... or at least they did if they accounted for the "Move Overhead" option defaulting to 10 milliseconds. If they left that at it's default then 10ms - 10ms = 0ms so 🤷♀️.
There is also no information about the hardware or number of threads they ran this one, which I feel is important information.
Evaluation Function
Stockfish's FAQ mentions that they have gone beyond centipawns for evaluating positions, because it's strong enough that material advantage is much less relevant than it used to be. I assume it doesn't really matter at level 1 with ~0 seconds to produce moves though.
Still since the author has Stockfish handy anyway, it'd be interesting to use it in it's not handicapped form to evaluate who won.