Technology

60053 readers

3095 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 2 years ago

MODERATORS

1301

Make illegally trained LLMs public domain as punishment (www.theregister.com)

submitted 21 hours ago by Joker@sh.itjust.works to c/technology@lemmy.world

154 comments fedilink hide all child comments

It's all made from our data, anyway, so it should be ours to use as we want

you are viewing a single comment's thread
view the rest of the comments

[–] Bronzebeard@lemm.ee 8 points 20 hours ago (1 children)

Making it open source doesn't change how it works. It doesn't need the data after it's been trained. Most of these AIs are just figuring out patterns to look for in the new data it comes across.

[–] NoForwardslashS@sopuli.xyz 3 points 20 hours ago (2 children)

So you're saying the data wouldn't exist anywhere in the source code, but it would still be able to answer questions based on the data it has previously seen?

[–] Bronzebeard@lemm.ee 1 points 2 hours ago

Most AI are not built to answer questions. They're designed to act as some kind of detection/filter heuristic to identify specific things about an input that leads to a desired output.

[–] stephen01king@lemmy.zip 16 points 20 hours ago (1 children)

That is how LLM works, they don't store the data as data, but as weight values.

[–] NoForwardslashS@sopuli.xyz 1 points 18 hours ago (1 children)

So then why, if it were all open sourced, including the weights, would the AI be worthless? Surely having an identical but open source version, that would strip profitability from the original paid product.

[–] Bronzebeard@lemm.ee 3 points 16 hours ago (1 children)

It wouldn't be. It would still work. It just wouldn't be exclusively available to the group that created it-any competitive advantage is lost.

But all of this ignores the real issue - you're not really punishing the use of unauthorized data. Those who owned that data are still harmed by this.

[–] stephen01king@lemmy.zip 2 points 15 hours ago (1 children)

It does discourages the use of unauthorised data. If stealing doesn't give you competitive advantage, it's not really worth the risk and cost of stealing it in the first place.

[–] Bronzebeard@lemm.ee 1 points 2 hours ago (1 children)

If you can still use it after you stole it, as opposed to not being able to use it at all... Then it does give you an incentive

[–] stephen01king@lemmy.zip 1 points 1 hour ago

If you did all the work and potentially criminal collection of data, but everyone else gets the benefit as well, that is not an incentive. You underestimate how selfish corporations can be.

OpenAI wouldn't stay at the forefront of LLM if every competitor gets to use the model they spent money on training.