this post was submitted on 24 Dec 2024

70 points (98.6% liked)

askchapo

23062 readers

113 users here now

Ask Hexbear is the place to ask and answer ~~thought-provoking~~ questions.

Rules:

Posts must ask a question.
If the question asked is serious, answer seriously.
Questions where you want to learn more about socialism are allowed, but questions in bad faith are not.
Try !feedback@hexbear.net if you're having questions about regarding moderation, site policy, the site itself, development, volunteering or the mod team.

founded 5 years ago

MODERATORS

PorkrollPosadist@hexbear.net

replaceable@hexbear.net

VILenin@hexbear.net

EmmaGoldman@hexbear.net

SexUnderSocialism@hexbear.net

ZoomeristLeninist@hexbear.net

khizuo@hexbear.net

Sulvy@hexbear.net

70

What is it about ai art that makes it so recognisable? (Beyond obvious artifacts like fucked up hands or writing) (hexbear.net)

submitted 6 months ago by CliffordBigRedDog@hexbear.net to c/askchapo@hexbear.net

35 comments fedilink hide all child comments

Stumbled onto this shitty game trailer on youtube and instantly clocked it as ai art

Cant really put to words why exactly they're so instantly recognisable

Something about the wierd sharp lines and doll like faces

all 39 comments

sorted by: hot top controversial new old

[–] Krem@hexbear.net 48 points 6 months ago (2 children)

in addition to everything else in this thread, ai """""""art"""""" often has this hyper-saturated feel, a bit like some ads or youtube thumbnails, designed to draw the eye to it but just makes everything look too sharp and flat. like the MSG of graphics. not sure how much that makes sense

also how lazy do you have to be to not hand draw/render a chocolate bar? it's just a bunch of chamfered rectangles! no, instead we get 10 chocolate pieces in 10 different shapes and sizes

[–] CliffordBigRedDog@hexbear.net 20 points 6 months ago (2 children)

MSG is good tho

[–] NephewAlphaBravo@hexbear.net 23 points 6 months ago* (last edited 6 months ago)

yeah but it tastes like shit if you just eat a fistful of it

art tends to have detailed focal points that stand out from a less detailed background, places for your eyes to be drawn and rest. ai slop has this weird uniform level of detail all over that makes my eyes vibrate

[–] Krem@hexbear.net 14 points 6 months ago

it's fine but if you rely on it to make bland food taste OK, instead of making the other ingredients/seasoning shine, it just makes everything taste like MSG. by itself it just tastes like "flavour" without anything behind it. SOOO i guess in this metaphor, graphics that only have "instant attraction" without any thought or concept behind it just has too much AIMSG (adobe instant messenger)

[–] miz@hexbear.net 8 points 6 months ago

chamfered

thank you for teaching me this word

[–] Riffraffintheroom@hexbear.net 37 points 6 months ago (1 children)

Basically there’s a lack of intention. Like the hair in that picture, there’s tons of detailed little strands at the bottom. If a human did this they’d probably have one or two strands like that that imply as wispy texture. The ai doesn’t know why details are good, just that they are good and therefore more of them is more good.

[–] john_browns_beard@hexbear.net 26 points 6 months ago

For me it's the weird shamrock hat thing. You're telling me someone who can draw this well just put some amorphous decoration there with no real detail? It just doesn't make sense.

[–] 7bicycles@hexbear.net 29 points 6 months ago

everything it ever does has this patina of mid-to-late-2000s-deviantart on which for all I care might aswell be because the deviantart website was easy to scrap for copyright infringement / training data

[–] infuziSporg@hexbear.net 26 points 6 months ago

Everything looks "too perfect", like a bunch of elements that are each the platonic ideal or textbook photo of the thing. Everything pops out at you, every last thing is the focus, there isn't really a background built into it.

Also, perhaps as a result of this, the lighting and shading are often incoherent.

[–] jackmarxist@hexbear.net 24 points 6 months ago (1 children)

AI art reminds me of Mr Beast thumbnails

[–] FunkyStuff@hexbear.net 24 points 6 months ago

I don't think the lines are very sharp at all, if anything the style stands out because it tends to have lots of vague blurry regions, where a human drawn image would have details and texture. Something about AI generated images tends to make them much less detailed, at least with far less recognizable detail, than what humans would draw. And when there are details they're nondescript.

[–] TheSpectreOfGay@hexbear.net 20 points 6 months ago

It's because the way it's made is the opposite of how digital art is actually made. things like clothing brushes, symmetry and perspective rulers, and textured brushes, just isn't used by ai obviously. because of that, human made art has a level of cohesion that ai art cannot replicate. this is especially true in shading. while ai might be able to replicate the style of a textured brush, it ends up looking like they used 5,000 different ones for no reason at all. there's no clear intention behind any of the creative decisions made. even if you think certain things in digital art are bad because the creative decision being made is "this saves time", it's still a human making a decision.

so your brain is parsing a jarbled mess, and even if you can't quite place what about it is jarbled, it's easy to pattern recognize that sorta nonsense as ai after seeing enough of it

[–] FlakesBongler@hexbear.net 19 points 6 months ago

It also falls into the same three categories every time

Bad Pixar, Bad Anime, Bad Corporate Memphis

And don't get me started on all the weird gibberish text that always shows up

[–] EstraDoll@hexbear.net 17 points 6 months ago

Incredible detail for a piece made without any clear artistic vision? idk what i'm talking about but this feels like i'm onto something

[–] Comrade_Mushroom@hexbear.net 16 points 6 months ago

The unintentional asymmetry of it is what always exposes it to me, especially when there's no weird finger distortion present.

[–] Drewfro66@lemmygrad.ml 16 points 6 months ago (1 children)

AI art tends to mess up small, repetitive details. The little nubs on the hub of the gear on her shoulder, or on the random plate on her hat. The AI does not understand that these things are supposed to be perfectly identical and symmetrical, so it mixes them up a little bit to make the piece seem more organic.

When a human artist is drawing a background, it will either be (1). Simplified or (2). Full of Easter Eggs. AI will just cram it full of pointless "detail".

[–] keepcarrot@hexbear.net 1 points 6 months ago

Bleh, symmetry in faces or mechanical objects is something I'm really bad at. Huge amount of erase marks in areas that are supposed to have circles :(

[–] TechnoUnionTypeBeat@hexbear.net 16 points 6 months ago (1 children)

I tend to find AI slop just super overcooked. Because the people who use it lack any sense for what art is, they adopt a "more is more" approach, so the art is always just jammed full of details in every inch, regardless of whether it works in terms of composition or sense, making it look busy and messy. Why is there a random shiny gear over her shoulder? What is the Thing on her other shoulder? Why has the chocolate got shid on it?

Even when the slop generator can prompt things well enough to keep the composition restrained, the rendering of everything will be overcooked. Every artist will skimp on some parts of an image to deal with the focal points, while AI will instead go 110% all out on rendering shading and lighting across every inch of the image again making it busy and messy

It's easy to clock this shit because it lacks any actual concept of what humans find appealing, because no human is involved at any step (even prompting it, techbros don't count)

[–] conditional_soup@lemm.ee 5 points 6 months ago

Overcooked is a great way of putting it, tbh

[–] IMF_DOOM@hexbear.net 14 points 6 months ago

The thing i always notice is how weirdly thin the lines get at points and sort of just disappear into nothing

[–] drhead@hexbear.net 11 points 6 months ago

There's usually going to be a hegemonic style for AI art, since for most people making this stuff they're just going to put some vague keywords for a direction of the style then stuff the rest of the prompt with quality keywords. Often times hosted inference services will actually do the quality keyword stuffing for you or train in a house style. Whatever you don't specify is going to be filled in with essentially the model average (which is, of course, not going to be a representative average image, it's going to be the average of the "preferred" set for their preference optimization training). Practically nobody asks for mediocre images (because why would you), and people making models especially on hosted services often effectively won't let you.

Think of what you'd expect to get from requesting an image of "a beautiful woman". There's certainly a lot of different ideas that people have of which women are beautiful and what traits make a woman beautiful, across different individuals and especially across different cultures and time periods. But if you take a set of every picture that someone thought of as having a beautiful woman in it, and look at the mode of that distribution, it's going to settle on conventionally attractive by the standards of whatever group is labeling the images. And the same thing will happen with an AI model, training on those images labeled as "a beautiful woman" will shift its output towards conventionally attractive women. If you consider it as a set of traits contributing to conventional attractiveness, then it's also fairly likely that every "a beautiful woman" image will end up looking like a flawless supermodel, since the mode will be a woman with all of the most common traits in the "a beautiful woman" dataset. That often won't look natural, because we're not used to seeing flawless supermodels all of the time.

That's more or less what is happening when people make these AI images, but with the whole image and its style. The set of images labeled as "high quality" or whatever quality keyword, or that are in their preference optimization set, have attributes that are more common in those images than they are in other images. Those attributes end up becoming dominant and a lot of them will show up in a generated image stuffed with quality keywords or on a heavily DPO-tuned model, which may look unnatural when a typical good-looking natural image may have only a few of those traits. And the problem is exacerbated by each model having its own default flavor, and people heavily reusing the same sets of quality keywords, and I would honestly fully expect that I could pin part of it on how some text encoders work (CLIP's embeddings are hard to separate distinct concepts from and this does manifest in how images are generated, but a lot of recent popular models don't use CLIP so this doesn't necessarily always apply).

[–] glans@hexbear.net 11 points 6 months ago

I'm not as alert to as other people are to this but the above is in the category of "things nobody would ever make" as is a lot of AI stuff. An answer without even the whiff of a question.

But if someone was using this to make a more obviously-desirable thing I probably couldn't recognize it just from the aesthetics.

[–] invalidusernamelol@hexbear.net 8 points 6 months ago (2 children)

Because each diffusion image is built on uniform noise, they come out almost perfectly balanced.

Meaning the sum average of all color in the image is near perfect grey.

This also applies to shape, where shapes and shape distribution tend to be very balanced, symmetrical, and uniform.

[–] CrookedSerpent@hexbear.net 9 points 6 months ago

made-it-the-fuck-up

[–] JillOfAllTrades@hexbear.net 7 points 6 months ago (2 children)

This is untrue.

[–] drhead@hexbear.net 4 points 6 months ago* (last edited 6 months ago) (2 children)

Well, it was true for the first big models. The most recent generation of models do not have this problem.

Earlier models like Stable Diffusion 1.5 worked on noise (ϵ) prediction. All diffusion models work by training to predict where the noise is in an image, given images with differing levels of noise in them, and then you can sample from the model using a solver to get a coherent image in a smaller amount of steps. So, using ϵ as the prediction target, you're obviously not going to learn anything by trying to predict what part of pure noise is noise, because the entire image is noise. During sampling, the model will (correctly) predict on the first step that the pure noise input is pure noise, and remove the noise giving you a black image. To prevent this, people trained models with a non-zero SNR for the highest noise timestep. That way, they are telling the model that there is something actually meaningful in the random noise we're giving it. But since the noise we're giving it is always uniform, it ends up biasing the model towards making images with average brightness. The parts of the initial noise that it retains (since remember, we're no longer asking it to remove all of the noise, we're lying to it and telling it some of it is actually signal) usually also end up causing unusual artifacting. An easy test for these issues is to try to prompt "a solid black background" -- early models will usually output neutral gray squares or grayscale geometric patterns.

One of the early hacks for solving the average brightness issue was training with a random channelwise offset to the noise, and models like Stable Diffusion XL used this method. This allowed models to make very dark and light images, but also often made images end up being too dark or light, it's possible that you saw some of these about a year into the AI craze when this was the latest fad. The proper solution came with Bytedance's paper ( https://arxiv.org/pdf/2305.08891 ) showing a method allowing training with a SNR of zero at the highest noise timestep. The main change is that instead of predicting noise (ϵ), the model needs to predict velocity (v), which is a weighted combination between predicting noise and predicting the original sample x~0~. With that, at the highest noise timestep the sampler will predict the dataset mean (which will manifest as an incredibly blurry mess in the vague shape of whatever you're trying to make an image of). ~~People didn't actually implement this as-is for any new foundation model, most of what I saw of it was independent researchers running finetune projects, apparently because it was taking too much trial and error for larger companies to make it work well.~~ actually this isn't entirely true, people working on video models ended up adopting it more quickly because the artifacts from residual noise get very bad when you add a time dimension. A couple of groups made SDXL clones using this method.

The latest fad is using rectified flow which is a very different process from diffusion. The diffusion process is described by a stochastic differential equation (SDE), which adds some randomness and essentially follows a meandering path from input noise to the resulting image. The rectified flow process is an ordinary differential equation (ODE), which (ideally) follows a straight-line path from the input noise to the image, and can actually be run either forwards or backwards (since it's an ODE). Flux (the model used with Twitter's AI stuff) and Stable Diffusion 3/3.5 both use rectified flow. They don't have the average brightness issue at all because it makes zero mathematical or practical sense to have the end point be anything but pure noise. I've also heard people say that rectified flow doesn't typically show the same uniform level of detail that a few people in this thread have mentioned, I haven't really looked into that myself at all but I would be cautious about using uniform detail as a litmus test for that reason.

[–] invalidusernamelol@hexbear.net 1 points 6 months ago

Thanks for confirming that I'm not totally insane lol, I know a lot of the lighter models still do this and they're very obvious.

[–] invalidusernamelol@hexbear.net 1 points 6 months ago

At this point, uniform detail seems to be only an issue with the lower quality local models. A fun thing I've also noticed outside that is if you ask it to to arr if a band, and it puts text on the kick drum, it's almost always the Beatles font.

[–] invalidusernamelol@hexbear.net 1 points 6 months ago

My bad, I'm probably working on outdated information. I think it was a computerphile video where they showed how diffusion images tend towards a uniform intensity on all color channels due to them starting out as noise.

That all goes out the window of course if the input is not pure white noise of course.

[–] Dessa@hexbear.net 7 points 6 months ago

The darks are too black.

[–] janus2@lemmy.zip 6 points 6 months ago

if I had a nickel every time I saw knockoff Willy Wonka related AI slop I'd have 2 nickels, which isn't a lot but it's weird that it happened twice

[–] PointAndClique@hexbear.net 5 points 6 months ago* (last edited 6 months ago)

Comrade @x87_floatingpoint@hexbear.net had a couple of good threads asking the same question and got good answers, if you want to check its post history

[–] peppersky@hexbear.net 2 points 6 months ago

Somehow metal album covers managed to look like ai art years before ai art

[–] SkingradGuard@hexbear.net 1 points 6 months ago

I prefer those "photo to paint" magic shit that are in some artist painting programs over this ai slop