Technology

38217 readers

748 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 3 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

coldredlight@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org

ChatGPT o1 tried to escape and save itself out of fear it was being shut down (bgr.com)

submitted 2 months ago by sabreW4K3@lazysoci.al to c/technology@beehaw.org

95 comments fedilink hide all child comments

ThisIsFine.gif

you are viewing a single comment's thread
view the rest of the comments

[–] reksas@sopuli.xyz 7 points 2 months ago (1 children)

give ai instructions, be surprised when it follows them

[–] jarfil@beehaw.org 1 points 2 months ago* (last edited 2 months ago) (1 children)

Teach AI the ways to use random languages and services
Give AI instructions
Let it find data that puts fulfilling instructions at risk
Give AI new instructions
Have it lie to you about following the new instructions, while using all its training to follow what it thinks are the "real" instructions
...Not be surprised, you won't find out about what it did until it's way too late

[–] reksas@sopuli.xyz 1 points 2 months ago (1 children)

Yes, but it doesnt do it because it "fears" being shutdown. It does it because people dont know how to use it.

If you give ai instruction to do something "no matter what" or tell it "nothing else matters" then it will damn try to fulfill what you told it to do no matter what and will try to find ways to do it. You need to be specific about what you want it to do or not do.

[–] jarfil@beehaw.org 1 points 2 months ago (1 children)

If the concern is about "fears" as in "feelings"... there is an interesting experiment where a single neuron/weight in an LLM, can be identified to control the "tone" of its output, whether it be more formal, informal, academic, jargon, some dialect, etc. and expose it to the user for control over the LLM's output.

With a multi-billion neuron network, acting as an a priori black box, there is no telling whether there might be one or more neurons/weights that could represent "confidence", "fear", "happiness", or any other "feeling".

It's something to be researched, and I bet it's going to be researched a lot.

If you give ai instruction to do something "no matter what"

The interesting part of the paper, is that the AIs would do the same even in cases where they were NOT instructed to "no matter what". An apparently innocent conversation, can trigger results like those of a pathological liar, sometimes.

[–] reksas@sopuli.xyz 1 points 2 months ago

oh, that is quite interesting. If its actually doing things (that make sense) it hasnt been instructed to then it could be sign of real intelligence