124
submitted 5 days ago* (last edited 5 days ago) by Timely_Jellyfish_2077@programming.dev to c/chatgpt@lemmy.world

Small rant : Basically, the title. Instead of answering every question, if it instead said it doesn't know the answer, it would have been trustworthy.

you are viewing a single comment's thread
view the rest of the comments
[-] Cosmicomical@lemmy.world 2 points 2 days ago

Do you have a source for the "smiling when you don't really mean it" thing? I've been digging around but couldn't find that anywhere.

[-] kromem@lemmy.world 1 points 13 hours ago

It's right in the research I was mentioning:

https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html

Find the section on the model's representation of self and then the ranked feature activations.

I misremembered the top feature slightly, which was: responding "I'm fine" or gives a positive but insincere response when asked how they are doing.

this post was submitted on 29 Jun 2024
124 points (90.8% liked)

ChatGPT

8638 readers
5 users here now

Unofficial ChatGPT community to discuss anything ChatGPT

founded 1 year ago
MODERATORS