Technology

59466 readers

3522 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

265

Stable Diffusion XL Turbo can generate AI images as fast as you can type (arstechnica.com)

submitted 11 months ago by thehatfox@lemmy.world to c/technology@lemmy.world

49 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] barsoap@lemm.ee 14 points 11 months ago* (last edited 11 months ago)

I'd guess that the 'realtime' is a quote from StabilityAI and of course they're running that stuff on an A100. A couple of seconds is still interactive rate as generally speaking you want to think about the changes you're making to your conditioning.

Haven't tried yet but if individual steps of XL Turbo take ballpark as much time as LCM steps then... well, it's four to eight times faster. As quality generally isn't production-ready we're generally speaking about rough prompt prototyping, testing out an animation pipeline, such stuff, but that has the caveat that increasing step size often leads to markedly different results (complete change of composition, not just details) so the information you gain from those preview-quality images is limited.

Oh, "production ready quality": image quality being roughly en par with 4-step LCM means that it's nowhere near production grade. For the final render you still want to give the model more steps. OTOH I've found that some LCM-based merges do in 30 steps what other models need 80 steps for so improvements are always welcome. But I'm also worried about these distilled models being less flexible, pruning only slightly trodden paths that you actually might want the model to take.

EDIT: Addendum: I'm not seeing anything about using this stuff as a Lora. The nice thing about LCM is that you can take any model you have on your disk and turn it pretty much instantly into a model that can generate fast previews. Also, VAE decoding already can be slower than generation with LCM, so, yeah. I guess having something in between the full VAE and TAESD would be nice, TAESD is fast but is quite limited both when it comes to details, so much that you might not even be able to see what kind of texture SD generated. Oh and it also tends to get colours wrong, at least in my experience it tends to be oversaturated.