Stable Diffusion

4309 readers
1 users here now

Discuss matters related to our favourite AI Art generation technology

Also see

Other communities

founded 1 year ago
MODERATORS
1
 
 

This is a copy of /r/stablediffusion wiki to help people who need access to that information


Howdy and welcome to r/stablediffusion! I'm u/Sandcheeze and I have collected these resources and links to help enjoy Stable Diffusion whether you are here for the first time or looking to add more customization to your image generations.

If you'd like to show support, feel free to send us kind words or check out our Discord. Donations are appreciated, but not necessary as you being a great part of the community is all we ask for.

Note: The community resources provided here are not endorsed, vetted, nor provided by Stability AI.

#Stable Diffusion

Local Installation

Active Community Repos/Forks to install on your PC and keep it local.

Online Websites

Websites with usable Stable Diffusion right in your browser. No need to install anything.

Mobile Apps

Stable Diffusion on your mobile device.

Tutorials

Learn how to improve your skills in using Stable Diffusion even if a beginner or expert.

Dream Booth

How-to train a custom model and resources on doing so.

Models

Specially trained towards certain subjects and/or styles.

Embeddings

Tokens trained on specific subjects and/or styles.

Bots

Either bots you can self-host, or bots you can use directly on various websites and services such as Discord, Reddit etc

3rd Party Plugins

SD plugins for programs such as Discord, Photoshop, Krita, Blender, Gimp, etc.

Other useful tools

#Community

Games

  • PictionAIry : (Video|2-6 Players) - The image guessing game where AI does the drawing!

Podcasts

Databases or Lists

Still updating this with more links as I collect them all here.

FAQ

How do I use Stable Diffusion?

  • Check out our guides section above!

Will it run on my machine?

  • Stable Diffusion requires a 4GB+ VRAM GPU to run locally. However, much beefier graphics cards (10, 20, 30 Series Nvidia Cards) will be necessary to generate high resolution or high step images. However, anyone can run it online through DreamStudio or hosting it on their own GPU compute cloud server.
  • Only Nvidia cards are officially supported.
  • AMD support is available here unofficially.
  • Apple M1 Chip support is available here unofficially.
  • Intel based Macs currently do not work with Stable Diffusion.

How do I get a website or resource added here?

*If you have a suggestion for a website or a project to add to our list, or if you would like to contribute to the wiki, please don't hesitate to reach out to us via modmail or message me.

2
3
4
 
 

A quantized version of ControlNet Union for Flux for less powerful computers.

5
6
 
 

TL;DR

A new post-training training quantization paradigm for diffusion models, which quantize both the weights and activations of FLUX.1 to 4 bits, achieving 3.5× memory and 8.7× latency reduction on a 16GB laptop 4090 GPU.

Paper: http://arxiv.org/abs/2411.05007

Weights: https://huggingface.co/mit-han-lab/svdquant-models

Code: https://github.com/mit-han-lab/nunchaku

Blog: https://hanlab.mit.edu/blog/svdquant

Demo: https://svdquant.mit.edu/

7
8
 
 

Abstract

Diffusion models have demonstrated excellent capabilities in text-to-image generation. Their semantic understanding (i.e., prompt following) ability has also been greatly improved with large language models (e.g., T5, Llama). However, existing models cannot perfectly handle long and complex text prompts, especially when the text prompts contain various objects with numerous attributes and interrelated spatial relationships. While many regional prompting methods have been proposed for UNet-based models (SD1.5, SDXL), but there are still no implementations based on the recent Diffusion Transformer (DiT) architecture, such as SD3 and this http URL this report, we propose and implement regional prompting for FLUX.1 based on attention manipulation, which enables DiT with fined-grained compositional text-to-image generation capability in a training-free manner. Code is available at this https URL.

Paper: https://arxiv.org/abs/2411.02395

Code: https://github.com/instantX-research/Regional-Prompting-FLUX

9
 
 
  • Add Intel Core Ultra Series 2 (Lunar Lake) NPU support by @rupeshs in #277
  • Seeding improvements by @wbruna in #273
10
 
 

Details: https://github.com/Nerogar/OneTrainer/blob/master/docs/RamOffloading.md

  • Flux LoRA training on 6GB GPUs (at 512px resolution)
  • Flux Fine-Tuning on 16GB GPUs (or even less) +64GB of RAM
  • SD3.5-M Fine-Tuning on 4GB GPUs (at 1024px resolution)
11
12
 
 

Highlights for 2024-10-29

  • Support for all SD3.x variants
    SD3.0-Medium, SD3.5-Medium, SD3.5-Large, SD3.0-Large-Turbo
  • Allow quantization using bitsandbytes on-the-fly during models load Load any variant of SD3.x or FLUX.1 and apply quantization during load without the need for pre-quantized models
  • Allow for custom model URL in standard model selector
    Can be used to specify any model from HuggingFace or CivitAI
  • Full support for torch==2.5.1
  • New wiki articles: Gated Access, Quantization, Offloading

Plus tons of smaller improvements and cumulative fixes reported since last release

README | CHANGELOG | WiKi | Discord

13
14
15
 
 

Abstract

We propose Framer for interactive frame interpolation, which targets producing smoothly transitioning frames between two images as per user creativity. Concretely, besides taking the start and end frames as inputs, our approach supports customizing the transition process by tailoring the trajectory of some selected keypoints. Such a design enjoys two clear benefits. First, incorporating human interaction mitigates the issue arising from numerous possibilities of transforming one image to another, and in turn enables finer control of local motions. Second, as the most basic form of interaction, keypoints help establish the correspondence across frames, enhancing the model to handle challenging cases (e.g., objects on the start and end frames are of different shapes and styles). It is noteworthy that our system also offers an "autopilot" mode, where we introduce a module to estimate the keypoints and refine the trajectory automatically, to simplify the usage in practice. Extensive experimental results demonstrate the appealing performance of Framer on various applications, such as image morphing, time-lapse video generation, cartoon interpolation, etc. The code, the model, and the interface will be released to facilitate further research.

Paper: https://arxiv.org/abs/2410.18978

Code: https://github.com/aim-uofa/Framer

Project Page: https://aim-uofa.github.io/Framer/#comparison_with_baseline_container

16
 
 

Highlights for 2024-10-23

A month later and with nearly 300 commits, here is the latest SD.Next update!

Workflow highlights

  • Reprocess: New workflow options that allow you to generate at lower quality and then
    reprocess at higher quality for select images only or generate without hires/refine and then reprocess with hires/refine
    and you can pick any previous latent from auto-captured history!
  • Detailer Fully built-in detailer workflow with support for all standard models
  • Built-in model analyzer
    See all details of your currently loaded model, including components, parameter count, layer count, etc.
  • Extract LoRA: load any LoRA(s) and play with generate as usual
    and once you like the results simply extract combined LoRA for future use!

New models

What else?

  • Tons of work on dynamic quantization that can be applied on-the-fly during model load to any model type (you do not need to use pre-quantized models)
    Supported quantization engines include BitsAndBytes, TorchAO, Optimum.quanto, NNCF compression, and more...
  • Auto-detection of best available device/dtype settings for your platform and GPU reduces neeed for manual configuration
    Note: This is a breaking change to default settings and its recommended to check your preferred settings after upgrade
  • Full rewrite of sampler options, not far more streamlined with tons of new options to tweak scheduler behavior
  • Improved LoRA detection and handling for all supported models
  • Several of Flux.1 optimizations and new quantization types

Oh, and we've compiled a full table with list of top-30 (how many have you tried?) popular text-to-image generative models,
their respective parameters and architecture overview: Models Overview

And there are also other goodies like multiple XYZ grid improvements, additional Flux ControlNets, additional Interrogate models, better LoRA tags support, and more...
README | CHANGELOG | WiKi | Discord

17
18
19
 
 

Abstract

Significant advancements have been made in the field of video generation, with the open-source community contributing a wealth of research papers and tools for training high-quality models. However, despite these efforts, the available information and resources remain insufficient for achieving commercial-level performance. In this report, we open the black box and introduce Allegro, an advanced video generation model that excels in both quality and temporal consistency. We also highlight the current limitations in the field and present a comprehensive methodology for training high-performance, commercial-level video generation models, addressing key aspects such as data, model architecture, training pipeline, and evaluation. Our user study shows that Allegro surpasses existing open-source models and most commercial models, ranking just behind Hailuo and Kling. Code: this https URL , Model: this https URL , Gallery: this https URL .

Paper: https://arxiv.org/abs/2410.15458

Code: https://github.com/rhymes-ai/Allegro (coming soon)

Weights: https://huggingface.co/rhymes-ai/Allegro

Project Page: https://huggingface.co/blog/RhymesAI/allegro

20
21
22
23
 
 

Abstract

Recently, large-scale diffusion models have made impressive progress in text-to-image (T2I) generation. To further equip these T2I models with fine-grained spatial control, approaches like ControlNet introduce an extra network that learns to follow a condition image. However, for every single condition type, ControlNet requires independent training on millions of data pairs with hundreds of GPU hours, which is quite expensive and makes it challenging for ordinary users to explore and develop new types of conditions. To address this problem, we propose the CtrLoRA framework, which trains a Base ControlNet to learn the common knowledge of image-to-image generation from multiple base conditions, along with condition-specific LoRAs to capture distinct characteristics of each condition. Utilizing our pretrained Base ControlNet, users can easily adapt it to new conditions, requiring as few as 1,000 data pairs and less than one hour of single-GPU training to obtain satisfactory results in most scenarios. Moreover, our CtrLoRA reduces the learnable parameters by 90% compared to ControlNet, significantly lowering the threshold to distribute and deploy the model weights. Extensive experiments on various types of conditions demonstrate the efficiency and effectiveness of our method. Codes and model weights will be released at this https URL.

Paper: https://arxiv.org/abs/2410.09400

Code: https://github.com/xyfJASON/ctrlora

Weights: https://huggingface.co/xyfJASON/ctrlora/tree/main

24
25
 
 

Abstract

Diffusion models, such as Stable Diffusion, have made significant strides in visual generation, yet their paradigm remains fundamentally different from autoregressive language models, complicating the development of unified language-vision models. Recent efforts like LlamaGen have attempted autoregressive image generation using discrete VQVAE tokens, but the large number of tokens involved renders this approach inefficient and slow. In this work, we present Meissonic, which elevates non-autoregressive masked image modeling (MIM) text-to-image to a level comparable with state-of-the-art diffusion models like SDXL. By incorporating a comprehensive suite of architectural innovations, advanced positional encoding strategies, and optimized sampling conditions, Meissonic substantially improves MIM's performance and efficiency. Additionally, we leverage high-quality training data, integrate micro-conditions informed by human preference scores, and employ feature compression layers to further enhance image fidelity and resolution. Our model not only matches but often exceeds the performance of existing models like SDXL in generating high-quality, high-resolution images. Extensive experiments validate Meissonic's capabilities, demonstrating its potential as a new standard in text-to-image synthesis. We release a model checkpoint capable of producing 1024×1024 resolution images.

Paper: https://arxiv.org/abs/2410.08261

Code: https://github.com/viiika/Meissonic

Model: https://huggingface.co/MeissonFlow/Meissonic

view more: next ›