Technology

59466 readers

3354 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

888

Asking ChatGPT to Repeat Words ‘Forever’ Is Now a Terms of Service Violation (www.404media.co)

submitted 11 months ago by misk@sopuli.xyz to c/technology@lemmy.world

232 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] Jamie@jamie.moe 2 points 11 months ago* (last edited 11 months ago) (1 children)

Speaking for LLMs, given that they operate on a next-token basis, there will be some statistical likelihood of spitting out original training data that can't be avoided. The normal counter-argument being that in theory, the odds of a particular piece of training data coming back out intact for more than a handful of words should be extremely low.

Of course, in this case, Google's researchers took advantage of the repeat discouragement mechanism to make that unlikelihood occur reliably, showing that there are indeed flaws to make it happen.

[–] TWeaK@lemm.ee 3 points 11 months ago

If a person studies a text then writes an article about the same subject as that text while using the same wording and discussing the same points, then it's plagiarism whether or not they made an exact copy. Surely it should also be the case with LLM's, which train on the data then inadvertently replicate the data again? The law has already established that it doesn't matter what the process is for making the new work, what matters is how close it is to the original work.