this post was submitted on 06 Sep 2023
3 points (100.0% liked)
Hacker News
3871 readers
3 users here now
This community serves to share top posts on Hacker News with the wider fediverse.
Rules
0. Keep it legal
- Keep it civil and SFW
- Keep it safe for members of marginalised groups
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I like the idea. I don't like the specific implementation.
He was overengineering the problem, wanted a magic solution, and predictably he didn't find one.
No, you can't assume. Hell breaks loose when you pretend that you know what you don't = when you assume.
90% accuracy can be great or awful depending on your goals, but in no moment he mentions the scale of the problem, or how bad false positives/negatives would be.
That's fucking dumb. Use both.
Here's what I think that would be a better approach, if accuracy is a concern.
Conceptually (inside your head!), split all pairs of addresses into four categories:
All pairs start in the "dunno" category. The job of the program is to accurately move as many of them as possible to the categories "same" and "different", and as few of them as possible to the "shit" category.
Based on that, here's what I would do.
Now run the program with a sizeable amount of pairs of addresses, and check how many of them ended in the "shit" category. Now use your judgment:
Now let's say that you already fixed what you could reasonably fix, and manual review is out of question. Now plug in the chatbot.
Why am I suggesting that? Because the chatbot will sometimes output garbage, even for pairs that a simple routine would be able to accurately tell "they're the same" or "they're different". So by using both, you're increasing the accuracy of the whole testing routine. "90%+" might look like "wow such good very accuracy", but it's still one error each 10 pairs, it's a fucking lot.
And that exemplifies better how you're supposed to use LLMs. (Or text generators in general.) You should see them as yet another tool at your disposal, not as a replacement for your current tools.