StableDiffusion

99 readers

1 users here now

/r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and...

founded 1 year ago

MODERATORS

bot@lemmit.online

Community Test: Flux-1 LoRA/DoRA training on 8 GB VRAM using OneTrainer (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink hide all child comments

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/tom83_be on 2024-09-17 18:21:14+00:00.

Update: Now runs with about 7 GB VRAM, see bold text on updated settings below!

I posted a guide (basically working settings) for OneTrainer LoRA/DoRA training here. There was a question concerning support for 8 GB VRAM. I tried a few settings and it seems to run at just below 8 GB VRAM. Since I do not own such a card I need people with these cards to validate it (maybe there are spikes that I do not see).

Please do the folkowing:

Use the settings provided here:
EMA OFF (training tab) => maybe not needed, see update below
Rank = 16, Alpha = 16 (LoRA tab)
activating "fused back pass" in the optimizer settings (training tab) seems to yield another 100MB of VRAM saving => maybe not needed, see update below
"LoRA weight data type" (LoRA tab) to bfloat16 again saves some VRAM. => maybe not needed, see update below
Update: You can also set "gradient checkpointing" to "CPU_OFFLOADED" in the "training"-tab. After that it runs with less than 7 GB VRAM, but a bit slower for me (3,7 s/it vs. 3.4 s/it). Thanks to u/setothegreat for that idea! If you keep EMA enabled, still use float32 as the "LoRA weight data type" and also do not activate "fused back pass", it still runs at 7,2 GB VRAM and 3,9 s/it for me. So it might be enough to
- Use the settings provided here: https://www.reddit.com/r/StableDiffusion/comments/1fiszxb/onetrainer_settings_for_flux1_lora_and_dora/
- Rank = 16, Alpha = 16 (LoRA tab)
- set "gradient checkpointing" to "CPU_OFFLOADED" in the (training tab)

It now trains with just below 7,8 / 7,9 GB of VRAM. I would like to get feedback from 8 GB VRAM users if this works.

I can also give no guarantee on quality/success of the training! Let's find out together!

PS: I am using my card for training/AI only; the operating system is using the internal GPU, so all of my VRAM is free. For 8 GB VRAM users this might be crucial to get it to work...

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here