this post was submitted on 05 Feb 2025
464 points (97.0% liked)

Greentext

5002 readers
1215 users here now

This is a place to share greentexts and witness the confounding life of Anon. If you're new to the Greentext community, think of it as a sort of zoo with Anon as the main attraction.

Be warned:

If you find yourself getting angry (or god forbid, agreeing) with something Anon has said, you might be doing it wrong.

founded 1 year ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] sugar_in_your_tea@sh.itjust.works 3 points 16 hours ago* (last edited 16 hours ago) (1 children)

I asked an LLM to generate tests for a 10 line function with two arguments, no if branches, and only one library function call. It's just a for loop and some math. Somehow it invented arguments, and the ones that actually ran didn't even pass. It made like 5 test functions, spat out paragraphs explaining nonsense, and it still didn't work.

This was one of the smaller deepseek models, so perhaps a fancier model would do better.

I'm still messing with it, so maybe I'll find some tasks it's good at.

[–] KillingTimeItself@lemmy.dbzer0.com 1 points 16 hours ago (1 children)

from what i understand the "preview" models are quite handicapped, usually the benchmark is the full fat model for that reason. the recent openAI one (they have stupid names idk what is what anymore) had a similar problem.

If it's not a preview model, it's possible a bigger model would help, but usually prompt engineering is going to be more useful. AI is really quick to get confused sometimes.

It might be, idk, my coworker set it up. It's definitely a distilled model though. I did hope it would do a better job on such a small input though.