1

33

submitted 1 year ago by db0@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

11 comments fedilink

This is a copy of /r/stablediffusion wiki to help people who need access to that information

Howdy and welcome to r/stablediffusion! I'm u/Sandcheeze and I have collected these resources and links to help enjoy Stable Diffusion whether you are here for the first time or looking to add more customization to your image generations.

If you'd like to show support, feel free to send us kind words or check out our Discord. Donations are appreciated, but not necessary as you being a great part of the community is all we ask for.

Note: The community resources provided here are not endorsed, vetted, nor provided by Stability AI.

#Stable Diffusion

Local Installation

Active Community Repos/Forks to install on your PC and keep it local.

Online Websites

Websites with usable Stable Diffusion right in your browser. No need to install anything.

Mobile Apps

Stable Diffusion on your mobile device.

Tutorials

Learn how to improve your skills in using Stable Diffusion even if a beginner or expert.

Dream Booth

How-to train a custom model and resources on doing so.

Models

Specially trained towards certain subjects and/or styles.

Embeddings

Tokens trained on specific subjects and/or styles.

Bots

Either bots you can self-host, or bots you can use directly on various websites and services such as Discord, Reddit etc

3rd Party Plugins

SD plugins for programs such as Discord, Photoshop, Krita, Blender, Gimp, etc.

Other useful tools

Diffusion Toolkit - Image viewer/organizer that scans your images for PNGInfo generated.
Pixiz Morphing - Easily transition between 2 photos.
Bulk Image Resizing Made Easy 2.0

#Community

Games

PictionAIry : (Video|2-6 Players) - The image guessing game where AI does the drawing!

Podcasts

This is Not An AI Art Podcast - Doug Smith talks about Ai Art and provides the prompts/workflow on his site.

Databases or Lists

AiArtApps
Stable Diffusion Akashic Records
Questianon's SD Updates 1
Questianon's SD Updates 2
SW-Yw's Stable Diffusion Repo List
Plonk's SD Model List (NSFW)
Nightkall's Useful Lists
Civitai - Website with a list of custom models.

Still updating this with more links as I collect them all here.

FAQ

How do I use Stable Diffusion?

Check out our guides section above!

Will it run on my machine?

Stable Diffusion requires a 4GB+ VRAM GPU to run locally. However, much beefier graphics cards (10, 20, 30 Series Nvidia Cards) will be necessary to generate high resolution or high step images. However, anyone can run it online through DreamStudio or hosting it on their own GPU compute cloud server.
Only Nvidia cards are officially supported.
AMD support is available here unofficially.
Apple M1 Chip support is available here unofficially.
Intel based Macs currently do not work with Stable Diffusion.

How do I get a website or resource added here?

*If you have a suggestion for a website or a project to add to our list, or if you would like to contribute to the wiki, please don't hesitate to reach out to us via modmail or message me.

2

10

kijai/ComfyUI-MochiWrapper - ComfyUI wrapper nodes for Mochi video generator (github.com)

submitted 2 days ago* (last edited 2 days ago) by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

3

10

Stable Diffusion 3.5 Prompt Guide — Stability AI (stability.ai)

submitted 4 days ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

4

5

dim/Shakker-Labs_FLUX.1-dev-ControlNet-Union-Pro-fp8.safetensors (huggingface.co)

submitted 4 days ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

A quantized version of ControlNet Union for Flux for less powerful computers.

5

8

Astropulse/mixamotoopenpose: Convert Mixamo animations directly to OpenPose image sequences (github.com)

submitted 1 week ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

6

15

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models (cdn.prod.website-files.com)

submitted 1 week ago* (last edited 1 week ago) by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

TL;DR

A new post-training training quantization paradigm for diffusion models, which quantize both the weights and activations of FLUX.1 to 4 bits, achieving 3.5× memory and 8.7× latency reduction on a 16GB laptop 4090 GPU.

Paper: http://arxiv.org/abs/2411.05007

Weights: https://huggingface.co/mit-han-lab/svdquant-models

Code: https://github.com/mit-han-lab/nunchaku

Blog: https://hanlab.mit.edu/blog/svdquant

Demo: https://svdquant.mit.edu/

7

1

jhj0517/sd-webui-AdvancedLivePortrait: sd webui (forge) extension for AdvancedLivePortrait (github.com)

submitted 1 week ago* (last edited 1 week ago) by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

8

9

Training-free Regional Prompting for Diffusion Transformers (imgur.com)

submitted 1 week ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

Abstract

Diffusion models have demonstrated excellent capabilities in text-to-image generation. Their semantic understanding (i.e., prompt following) ability has also been greatly improved with large language models (e.g., T5, Llama). However, existing models cannot perfectly handle long and complex text prompts, especially when the text prompts contain various objects with numerous attributes and interrelated spatial relationships. While many regional prompting methods have been proposed for UNet-based models (SD1.5, SDXL), but there are still no implementations based on the recent Diffusion Transformer (DiT) architecture, such as SD3 and this http URL this report, we propose and implement regional prompting for FLUX.1 based on attention manipulation, which enables DiT with fined-grained compositional text-to-image generation capability in a training-free manner. Code is available at this https URL.

Paper: https://arxiv.org/abs/2411.02395

Code: https://github.com/instantX-research/Regional-Prompting-FLUX

9

2

rupeshs/fastsdcpu Release v1.0.0-beta.90 (github.com)

submitted 2 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

Add Intel Core Ultra Series 2 (Lunar Lake) NPU support by @rupeshs in #277
Seeding improvements by @wbruna in #273

10

11

Nerogar/OneTrainer: OneTrainer now supports efficient RAM offloading for training on low end GPUs (github.com)

submitted 2 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

Details: https://github.com/Nerogar/OneTrainer/blob/master/docs/RamOffloading.md

Flux LoRA training on 6GB GPUs (at 512px resolution)
Flux Fine-Tuning on 16GB GPUs (or even less) +64GB of RAM
SD3.5-M Fine-Tuning on 4GB GPUs (at 1024px resolution)

11

8

Invoke 5.3 Released (youtu.be)

submitted 2 weeks ago* (last edited 2 weeks ago) by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

Release: https://github.com/invoke-ai/InvokeAI/releases/

12

3

vladmandic/automatic: SD.Next Release for 2024-10-29 (github.com)

submitted 2 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

Highlights for 2024-10-29

Support for all SD3.x variants
SD3.0-Medium, SD3.5-Medium, SD3.5-Large, SD3.0-Large-Turbo
Allow quantization using bitsandbytes on-the-fly during models load Load any variant of SD3.x or FLUX.1 and apply quantization during load without the need for pre-quantized models
Allow for custom model URL in standard model selector
Can be used to specify any model from HuggingFace or CivitAI
Full support for torch==2.5.1
New wiki articles: Gated Access, Quantization, Offloading

Plus tons of smaller improvements and cumulative fixes reported since last release

README | CHANGELOG | WiKi | Discord

13

12

Stable Diffusion 3.5 Medium Released (huggingface.co)

submitted 2 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

1 comments fedilink

14

6

XLabs-AI/flux-ip-adapter-v2 (huggingface.co)

submitted 2 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

15

13

Framer: Interactive Frame Interpolation (youtu.be)

submitted 3 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

Abstract

We propose Framer for interactive frame interpolation, which targets producing smoothly transitioning frames between two images as per user creativity. Concretely, besides taking the start and end frames as inputs, our approach supports customizing the transition process by tailoring the trajectory of some selected keypoints. Such a design enjoys two clear benefits. First, incorporating human interaction mitigates the issue arising from numerous possibilities of transforming one image to another, and in turn enables finer control of local motions. Second, as the most basic form of interaction, keypoints help establish the correspondence across frames, enhancing the model to handle challenging cases (e.g., objects on the start and end frames are of different shapes and styles). It is noteworthy that our system also offers an "autopilot" mode, where we introduce a module to estimate the keypoints and refine the trajectory automatically, to simplify the usage in practice. Extensive experimental results demonstrate the appealing performance of Framer on various applications, such as image morphing, time-lapse video generation, cartoon interpolation, etc. The code, the model, and the interface will be released to facilitate further research.

Paper: https://arxiv.org/abs/2410.18978

Code: https://github.com/aim-uofa/Framer

Project Page: https://aim-uofa.github.io/Framer/#comparison_with_baseline_container

16

6

SD.Next Release for 2024-10 (github.com)

submitted 3 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

Highlights for 2024-10-23

A month later and with nearly 300 commits, here is the latest SD.Next update!

Workflow highlights

Reprocess: New workflow options that allow you to generate at lower quality and then
reprocess at higher quality for select images only or generate without hires/refine and then reprocess with hires/refine
and you can pick any previous latent from auto-captured history!
Detailer Fully built-in detailer workflow with support for all standard models
Built-in model analyzer
See all details of your currently loaded model, including components, parameter count, layer count, etc.
Extract LoRA: load any LoRA(s) and play with generate as usual
and once you like the results simply extract combined LoRA for future use!

New models

New fine-tuned CLiP-ViT-L 1st stage text-encoders used by most models (SD15/SDXL/SD3/Flux/etc.) brings additional details to your images
New models:
Stable Diffusion 3.5 Large
OmniGen
CogView 3 Plus
Meissonic
Additional integration:
Ctrl+X which allows for control of structure and appearance without the need for extra models,
APG: Adaptive Projected Guidance for optimal guidance control,
LinFusion for on-the-fly distillation of any sd15/sdxl model

What else?

Tons of work on dynamic quantization that can be applied on-the-fly during model load to any model type (you do not need to use pre-quantized models)
Supported quantization engines include BitsAndBytes, TorchAO, Optimum.quanto, NNCF compression, and more...
Auto-detection of best available device/dtype settings for your platform and GPU reduces neeed for manual configuration
Note: This is a breaking change to default settings and its recommended to check your preferred settings after upgrade
Full rewrite of sampler options, not far more streamlined with tons of new options to tweak scheduler behavior
Improved LoRA detection and handling for all supported models
Several of Flux.1 optimizations and new quantization types

Oh, and we've compiled a full table with list of top-30 (how many have you tried?) popular text-to-image generative models,
their respective parameters and architecture overview: Models Overview

And there are also other goodies like multiple XYZ grid improvements, additional Flux ControlNets, additional Interrogate models, better LoRA tags support, and more...
README | CHANGELOG | WiKi | Discord

17

16

Mochi 1: A new SOTA in Open-Source Video Generation Models (huggingface.co)

submitted 3 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

3 comments fedilink

Blog: https://www.genmo.ai/blog

Code: https://github.com/genmoai/models

Weights: https://huggingface.co/genmo/mochi-1-preview

18

11

Stable Diffusion 3.5 — Stability AI (stability.ai)

submitted 3 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

6 comments fedilink

19

6

Allegro: Advanced Video Generation Model (huggingface.co)

submitted 3 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

Abstract

Significant advancements have been made in the field of video generation, with the open-source community contributing a wealth of research papers and tools for training high-quality models. However, despite these efforts, the available information and resources remain insufficient for achieving commercial-level performance. In this report, we open the black box and introduce Allegro, an advanced video generation model that excels in both quality and temporal consistency. We also highlight the current limitations in the field and present a comprehensive methodology for training high-performance, commercial-level video generation models, addressing key aspects such as data, model architecture, training pipeline, and evaluation. Our user study shows that Allegro surpasses existing open-source models and most commercial models, ranking just behind Hailuo and Kling. Code: this https URL , Model: this https URL , Gallery: this https URL .

Paper: https://arxiv.org/abs/2410.15458

Code: https://github.com/rhymes-ai/Allegro (coming soon)

Weights: https://huggingface.co/rhymes-ai/Allegro

Project Page: https://huggingface.co/blog/RhymesAI/allegro

20

14

OmniGen: Unified Image Generation Model and Code Released (github.com)

submitted 3 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

Code: https://github.com/VectorSpaceLab/OmniGen

Model: https://huggingface.co/Shitao/OmniGen-v1

Original Post: https://lemmy.dbzer0.com/post/28142696

21

20

ComfyUI V1 Release - Fully Packaged Desktop Version (blog.comfy.org)

submitted 3 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

3 comments fedilink

22

11

Acly/krita-ai-diffusion: Version 1.26.0 Custom ComfyUI Node Graphs From Within Krita (www.youtube.com)

submitted 4 weeks ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

1 comments fedilink

Repo: https://github.com/Acly/krita-ai-diffusion

Release: https://github.com/Acly/krita-ai-diffusion/releases/tag/v1.26.0

23

10

CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation (i.imgur.com)

submitted 1 month ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

Abstract

Recently, large-scale diffusion models have made impressive progress in text-to-image (T2I) generation. To further equip these T2I models with fine-grained spatial control, approaches like ControlNet introduce an extra network that learns to follow a condition image. However, for every single condition type, ControlNet requires independent training on millions of data pairs with hundreds of GPU hours, which is quite expensive and makes it challenging for ordinary users to explore and develop new types of conditions. To address this problem, we propose the CtrLoRA framework, which trains a Base ControlNet to learn the common knowledge of image-to-image generation from multiple base conditions, along with condition-specific LoRAs to capture distinct characteristics of each condition. Utilizing our pretrained Base ControlNet, users can easily adapt it to new conditions, requiring as few as 1,000 data pairs and less than one hour of single-GPU training to obtain satisfactory results in most scenarios. Moreover, our CtrLoRA reduces the learnable parameters by 90% compared to ControlNet, significantly lowering the threshold to distribute and deploy the model weights. Extensive experiments on various types of conditions demonstrate the efficiency and effectiveness of our method. Codes and model weights will be released at this https URL.

Paper: https://arxiv.org/abs/2410.09400

Code: https://github.com/xyfJASON/ctrlora

Weights: https://huggingface.co/xyfJASON/ctrlora/tree/main

24

8

numz/Comfyui-FlowChain: Convert your workflows into nodes and chain them together (github.com)

submitted 1 month ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

25

3

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis (huggingface.co)

submitted 1 month ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink

Abstract

Diffusion models, such as Stable Diffusion, have made significant strides in visual generation, yet their paradigm remains fundamentally different from autoregressive language models, complicating the development of unified language-vision models. Recent efforts like LlamaGen have attempted autoregressive image generation using discrete VQVAE tokens, but the large number of tokens involved renders this approach inefficient and slow. In this work, we present Meissonic, which elevates non-autoregressive masked image modeling (MIM) text-to-image to a level comparable with state-of-the-art diffusion models like SDXL. By incorporating a comprehensive suite of architectural innovations, advanced positional encoding strategies, and optimized sampling conditions, Meissonic substantially improves MIM's performance and efficiency. Additionally, we leverage high-quality training data, integrate micro-conditions informed by human preference scores, and employ feature compression layers to further enhance image fidelity and resolution. Our model not only matches but often exceeds the performance of existing models like SDXL in generating high-quality, high-resolution images. Extensive experiments validate Meissonic's capabilities, demonstrating its potential as a new standard in text-to-image synthesis. We release a model checkpoint capable of producing 1024×1024 resolution images.

Paper: https://arxiv.org/abs/2410.08261

Code: https://github.com/viiika/Meissonic

Model: https://huggingface.co/MeissonFlow/Meissonic