this post was submitted on 23 Oct 2023

533 points (85.9% liked)

Technology

59317 readers

4531 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

533

This new data poisoning tool lets artists fight back against generative AI (www.technologyreview.com)

submitted 1 year ago by ElectroVagrant@lemmy.world to c/technology@lemmy.world

131 comments fedilink hide all child comments

A new tool lets artists add invisible changes to the pixels in their art before they upload it online so that if it’s scraped into an AI training set, it can cause the resulting model to break in chaotic and unpredictable ways.

The tool, called Nightshade, is intended as a way to fight back against AI companies that use artists’ work to train their models without the creator’s permission.
[...]
Zhao’s team also developed Glaze, a tool that allows artists to “mask” their own personal style to prevent it from being scraped by AI companies. It works in a similar way to Nightshade: by changing the pixels of images in subtle ways that are invisible to the human eye but manipulate machine-learning models to interpret the image as something different from what it actually shows.

you are viewing a single comment's thread
view the rest of the comments

[–] 9thSun@midwest.social 9 points 1 year ago (4 children)

How is training AI with art on the web different to a person studying art styles? I'd say if the AI is being monetized in some capacity, then sure maybe there should be laws in place. I'm just hard-pressed to believe that anyone can have sole control of anything once it gets on the Internet.

[–] Zeth0s@lemmy.world 8 points 1 year ago* (last edited 1 year ago) (1 children)

I work in AI and I believe it is different. Society is built to distribute wealth, so that everyone can live a decent life. People and AI should be treated differently in front of the law. Also, non-commercial, open source AI should be treated differently than commercial or closed source models

[–] vidarh@lemmy.stad.social 9 points 1 year ago (1 children)

Society is built to distribute wealth, so that everyone can live a decent life.

As a goal, I admire it, but if you intend this as a description of how things are it'd be boundlessly naive.

[–] Zeth0s@lemmy.world 4 points 1 year ago* (last edited 1 year ago) (1 children)

That's absolutely not how it is now, just the goal we should set for ourselves. A goal I believe we should consider when regulating AI

[–] vidarh@lemmy.stad.social 5 points 1 year ago* (last edited 1 year ago) (1 children)

To me, that's not an argument for regulating AI, though, because most regulation we can come up with will benefit those with deep enough pockets to buy themselves out of the problem, while solving nothing.

E.g. as I've pointed out in other debates like this, Getty Images has a market cap of <$2bn. OpenAI may have had a valuation in the $90bn range. Google, MS, Adobe all also have shares prices that would trivially allow them to purchase someone like Getty to get ownership of a large training set of photos. Adobe already has rights to a huge selection via their own stock service.

Bertelsmann owns Penguin Random-House and a range ofter publishing subsidiaries. It's market cap is around 15 billion Euro. Also well within price for a large AI contender to buy to be able to insert clauses about AI rights. (You think authors will refuse to accept that? All but the top sellers will generally be unable to afford to turn down a publishing deal, especially if it's sugar-coated enough, but they also sit on a shit-ton of works where the source text is out-of-copyright but they own the right to the translations outright as works-for-hire)

That's before considering simply hiring a bunch of writers and artists to produce data for hire.

So any regulation you put in place to limit the use of copyrighted works only creates a "tax" effectively.

E.g. OpenAI might not be able to copy artist X's images, but they'll be able to hire artist Y on the cheap to churn out art in artist X's style for hire, and then train on that. They might not be able to use author Z's work, but they can hire a bunch of hungry writers (published books sells ca 200 copies on average; the average full time author in the UK earns below minimum wage from their writing) as a content farm.

The net result for most creators will be the same.

Even wonder why Sam Altmann of OpenAI has been lobbying about the dangers of AI? This is why. And its just the start. As soon as these companies have enough capital to buy themselves access for data, regulations preventing training on copyrighted data will be them pulling up the drawbridge and making it cost-prohibitive for people to build open, publicly accessible models in ways that can be legally used.

And in doing so they'll effectively get to charge an "AI tax" on everyone else.

If we're going to protect artists, we'd be far better off finding other ways of compensating them for the effects, not least because it will actually provide them some protection.

[–] Zeth0s@lemmy.world 2 points 1 year ago (2 children)

UBI is the known solution to protect workers. Solution is there, people aren't ready for it

[–] vidarh@lemmy.stad.social 1 points 1 year ago

As long as people aren't ready for it, then it doesn't solve the immediate problem that needs to be solved today.

[–] BearOfaTime@lemm.ee -1 points 1 year ago (1 children)

Lol.

How does UBI break trademark and copyright law (and therefore legal cases)?

Do you really think the current power brokers will suddenly sit in their hands and stop trying to (mostly successfully) control as much as they can?

[–] Zeth0s@lemmy.world 1 points 1 year ago* (last edited 1 year ago)

UBI is needed because most of the jobs people are currently doing are already not needed. They are needed just to redistribute wealth, but most of the jobs are currently already useless (if you work in corporate, public sector or retail you know what I am talking about). In the future more will become useless. Current copyright laws are already outdated and don't work anymore. Only safe solution for people who want to dedicate their lives to visual art is UBI. Because of the known reasons. Most "artists" are not really doing art, simply a job for entertainment industry that in the future will be done by much fewer people due to technological and organizational changes. As it is already happening now, even before AI.

UBI is a solution for similar situations, that will be even more common in future. We need better solutions to redistribute wealth, from what you call "power brokers" to larger society

[–] FooBarrington@lemmy.world 5 points 1 year ago

I agree that the training isn't fundamentally different, but that monetization of the output has to be controlled. The big difference between AI and humans is the speed with which they create - you have to employ an army of humans to match the output of a couple of GPUs. For noncommercial projects this is amazing. For commercial projects, it destroys the artists livelihoods.

But this simply means that training shouldn't be controlled, inference in commercial contexts should be.

[–] rhombus@sh.itjust.works 0 points 1 year ago

The real issue comes in ownership of the AI models and the vast amount of labor involved in the training data. It’s taking what is probably hundreds of thousands of hours of labor in the form of art and converting it into a proprietary machine, all without compensating the artists involved. Whether you can make a comparison to a human studying art is irrelevant, because a corporation can’t own an artist, but they can own an AI and not have to pay it.

[–] realharo@lemm.ee -4 points 1 year ago* (last edited 1 year ago) (1 children)

How is training AI with art on the web different to a person studying art styles?

Human brains clearly work differently than AI, how is this even a question?

The term "learning" in machine learning is mainly a metaphor.

Also, laws are written with a practical purpose in mind - they are not some universal, purely philosophical construct and never have been.

[–] vidarh@lemmy.stad.social 8 points 1 year ago (1 children)

Human brains clearly work differently than AI, how is this even a question?

It's not all that clear that those differences are qualitatively meaningful, but that is irrelevant to the question they asked, so this is entirely a strawman.

Why does the way AI vs. the brain learn make training AI with art make it different to a person studying art styles? Both learn to generalise features that allows them to reproduce them. Both can do so without copying specific source material.

The term “learning” in machine learning is mainly a metaphor.

How do the way they learn differ from how humans learn? They generalise. They form "world models" of how information relates. They extrapolate.

Also, laws are written with a practical purpose in mind - they are not some universal, purely philosophical construct and never have been.

This is the only uncontroversial part of your answer. The main reason why courts will treat human and AI actions different is simply that they are not human. It will for the foreseeable future have little to do whether the processes are similar enough to how humans do it.

[–] realharo@lemm.ee -4 points 1 year ago* (last edited 1 year ago) (1 children)

Now you're just cherry picking some surface-level similarities.

You can see the difference in the process in the results, for example in how some generated pictures will contain something like a signature in the corner, simply because it resembles the training data - even though there is no meaning to it. Or how it is at least possible to get the model to output something extremely close to the training data - https://gizmodo.com/ai-art-generators-ai-copyright-stable-diffusion-1850060656.

That at least proves that the process is quite different to the process of human learning.

The question is how much those differences matter, and which similarities you want to focus on.

Human learning is similar in some ways, but greatly differs in other ways.

The fact that you're picking and choosing which similarities matter and which don't is just your arbitrary choice.

[–] vidarh@lemmy.stad.social 8 points 1 year ago* (last edited 1 year ago) (1 children)

You can see the difference in the process in the results, for example in how some generated pictures will contain something like a signature in the corner

If you were to train human children on an endless series of pictures with signatures in the corner, do you seriously think they'd not emulate signatures in the corner?

If you think that, you haven't seen many children's drawings, because children also often pick up that it's normal to put something in the corner, despite the fact that to children pictures with signatures is a tiny proportion of visual input.

Or how it is at least possible to get the model to output something extremely close to the training data

People also mimic. We often explicitly learn to mimic - e.g. I have my sons art folder right here, full of examples of him being explicitly taught to make direct copies as a means to learn technique.

We just don't have very good memory. This is an argument for a difference in ability to retain and reproduce inputs, not an argument for a difference in methods.

And again, this is a strawman. It doesn't even begin to try to answer the questions I asked, or the one raised by the person you first responded to.

That at least proves that the process is quite different to the process of human learning.

Neither of those really suggests that all (that diffusion is different to humans learn to generalize images is likely true, what you've described does not provide even the start of any evidence of that), but again that is a strawman.

There was no claim they work the same. The question raised was how the way they're trained is different from how a human learns styles.

[–] 9thSun@midwest.social 4 points 1 year ago

I appreciate your responses, thank you!