this post was submitted on 29 Sep 2023

439 points (93.5% liked)

Technology

59087 readers

3244 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

439

Authors Are Furious After Finding Their Works on List of Books Used To Train AI (www.themarysue.com)

submitted 1 year ago by stopthatgirl7@kbin.social to c/technology@lemmy.world

146 comments fedilink hide all child comments

Authors using a new tool to search a list of 183,000 books used to train AI are furious to find their works on the list.

you are viewing a single comment's thread
view the rest of the comments

[–] newthrowaway20@lemmy.world 32 points 1 year ago* (last edited 1 year ago) (3 children)

That's an interesting take, I didn't know software could be inspired by other people's works. And here I thought software just did exactly as it's instructed to do. These are language models. They were given data to train those models. Did they pay for the data that they used to train for it, or did they scrub the internet and steal all these books along with everything everyone else has said?

[–] FaceDeer@kbin.social 3 points 1 year ago (2 children)

Well, now you know; software can be inspired by other people's works. That's what AIs are instructed to do during their training phase.

[–] newthrowaway20@lemmy.world 0 points 1 year ago (2 children)

Does that mean software can also be afraid, or angry? What about happy software? Saying software can be inspired is like saying a rock can feel pain.

[–] lloram239@feddit.de 4 points 1 year ago* (last edited 1 year ago) (1 children)

Does that mean software can also be afraid, or angry?

If it is programmed/trained that way, sure. I recommend having a listen to Geoffrey Hinton on the topic (41:50).

Saying software can be inspired is like saying a rock can feel pain.

The rock doesn't do anything similar to pain. The LLM on the other side does a lot of things similar to inspiration. I can give the LLM a very trivial question and it will answer with a mountain of text. Did my question or the books it was trained on "inspire" the LLM to write that? Maybe, depends of course how far reaching you want to define the word. But either way, the LLM produced something by itself, that was neither a copy of my prompt nor the training data.

[–] PipedLinkBot@feddit.rocks 1 points 1 year ago

Here is an alternative Piped link(s):

Geoffrey Hinton on the topic

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I'm open-source; check me out at GitHub.

[–] FaceDeer@kbin.social -3 points 1 year ago

Software can do a lot of things that rocks can't do, that's not a good analogy.

Whether software can feel "pain" depends a lot on your definitions, but I think there are circumstances in which software can be said to feel pain. Simple worms can sense painful stimuli and react to it, a program can do the same thing.

We've reached the point where the simplistic prejudices about artificial intelligence common in science fiction are no longer useful guidelines for talking about real artificial intelligence. Sci-fi writers have long assumed that AIs couldn't create art and now it turns out it's one of the things they're actually rather good at.

[–] BURN@lemmy.world -3 points 1 year ago (1 children)

Software cannot be “inspired”

AIs in their training stages are simply just running extreme statistical analysis on the input material. They’re not “learning” they’re not “inspired” they’re not “understanding”

The anthropomorphism of these models is a major problem. They are not human, they don’t learn like humans.

[–] lloram239@feddit.de 3 points 1 year ago (1 children)

The anthropomorphism of these models is a major problem.

People attributing any kind of person hood or sentience is certainly a problem, the models are fundamentally not capable of that (no loops, no hidden thought). At least for now. However what you are doing isn't really much better, just utterly wrong in the opposite direction.

Those models are very definitely do "learn" and "understand" by every definition of the word. Simply playing around with that will quickly show that and it's baffling that anybody would try to claim otherwise. Yes, there are limits to what they can understand and there are plenty things that they can't do, but the amount of questions they can answer goes far beyond what is directly in the training data. Heck, even the fact that they hallucinate is proof that they understand, since it would be impossible to make completely plausible, but incorrect, stuff up without having a deep understanding of the topics. Also humans make mistakes too and they'll also make stuff up, so this isn't even anything AI specific.

[–] BURN@lemmy.world -3 points 1 year ago (1 children)

Yeah, that’s just flat out wrong

Hallucinations happen when there’s gaps in the training data and it’s just statistically picking what’s most likely to be next. It becomes incomprehensible when the model breaks down and doesn’t know where to go. However, the model doesn’t see a difference between hallucinating nonsense and a coherent sentence. They’re exactly the same to the model.

The model does not learn or understand anything. It statistically knows what the next word is. It doesn’t need to have seen something before to know that. It doesn’t understand what it’s outputting, it’s just outputting a long string that is gibberish to it.

I have formal training in AI and 90%+ of what I see people claiming AI can do is a complete misunderstanding of the tech.

[–] lloram239@feddit.de 2 points 1 year ago* (last edited 1 year ago)

I have formal training in AI

Than why do you keep talking such bullshit? You sound like you never even tried ChatGPT.

It statistically knows what the next word is.

Yes, that's understanding. What do you think your brain does differently? Please define whatever weird definition you have of "understand".

You are aware of Emergent World Representations? Or have a listen to what Ilya Sutskever has to say on the topic, one of the people behind GPT-4 and AlexNet.

It doesn’t understand what it’s outputting, it’s just outputting a long string that is gibberish to it.

Which is obviously nonsense, as I can ask it questions about its output. It can find mistakes in its own output and all that. It obviously understands what it is doing.

[–] PsychedSy@sh.itjust.works -2 points 1 year ago (1 children)

They weren't given data. They were shown data then the company spent tens of millions of dollars on cpu time to do statistical analysis of the data shown.

[–] newthrowaway20@lemmy.world 7 points 1 year ago (1 children)

A computer being shown data is a computer being given data. I don't understand your argument.

[–] lloram239@feddit.de 0 points 1 year ago

The data is gone by the time a user interacts with the AI. ChatGPT has no access to any books.