this post was submitted on 01 Nov 2024
20 points (100.0% liked)

AI Generated Images

7194 readers
229 users here now

Community for AI image generation. Any models are allowed. Creativity is valuable! It is recommended to post the model used for reference, but not a rule.

No explicit violence, gore, or nudity.

This is not a NSFW community although exceptions are sometimes made. Any NSFW posts must be marked as NSFW and may be removed at any moderator's discretion. Any suggestive imagery may be removed at any time.

Refer to https://lemmynsfw.com/ for any NSFW imagery.

No misconduct: Harassment, Abuse or assault, Bullying, Illegal activity, Discrimination, Racism, Trolling, Bigotry.

AI Generated Videos are allowed under the same rules. Photosensitivity warning required for any flashing videos.

To embed images type:

“![](put image url in here)”

Follow all sh.itjust.works rules.


Community Challenge Past Entries

Related communities:

founded 1 year ago
MODERATORS
 

ComfyUI is a locally-run UI (i.e. you need to have a beefy local GPU) that's generally more-complicated to learn than the older, more-common Automatic1111. However, it works kind of like some professional image-processing software does, lets you create a directed acyclic graph, lets one build up complicated workflows using those. It's also got some nice features, like the ability to queue up multiple requests (though it's not the only UI to do this, it's kind of a glaring limitation in Automatic1111). ComfyUI also caches what it has generated while traversing the DAG, and only regenerates what data is necessary. It's also capable of doing Flux, which Automatic1111 is not, and while Flux is not today a full replacement for everything I have done with Stable Diffusion, it can do a lot and has a far better ability to understand natural-language descriptions of scenes and less propensity to generate mangled images. As a result, I generally use ComfyUI rather than Automatic1111 now; it's a more "serious" tool for building up complex stuff.

Automatic1111 mostly consists of a positive prompt, a negative prompt, and maybe fields for any plugins you've installed. There are entirely-decoupled features for generating an image from text (txt2img) and processing existing images (img2img). Since often one may want to generate an image and then process it, this makes image generation a multi-stage process, which requires a lot of manual involvement.

In Automatic1111, some information about how to generate the image is displayed beneath the image. You can copy this from the Web UI. It'll contain the prompt text and -- I think -- all of the settings in the txt2img panel, at least as long as any plugins used properly implement this feature. So you can let someone else see what you did and recreate the image by posting the text there, post the "source code" of your image; when I post an image generated in Automatic1111, I paste this text beneath the image. It contains stuff like the scheduler used, how many iterations were run, and so forth.

However:

  • It doesn't contain system settings, some of which are necessary to recreate the image.

  • Running txt2img and img2img are two different steps. So, for example, if you want to generate an image in txt2img at a lower resolution, then take the resulting image and upscale it in img2img using the SD Ultimate Upscale plugin -- something that I generally do -- then these are two different operations.

  • While it's nice that it's human-readable, It's not machine-readable. Someone else can't just paste the text into their Automatic1111 setup and have it import those settings. They need to manually enter it.

In contrast, ComfyUI lets someone save their entire DAG workflow by just clicking "save". It'll generate a JSON file that anyone else can open and use (and tweak, if they want). This is much, much nicer in my book, since it lets me share the "source code" to my images with other users in a way that's trivial for them to import.

Both Automatic1111 and ComfyUI attach their text to the image in EXIF tags and indicate that the image is AI-generated (probably good practice to help avoid polluting later training data). On Linux, I can use ImageMagick, for example, to see this, run $ identify -verbose image.png. Unfortunately, Lemmy or Pictrs or whatever the image-posting functionality is for Lemmy instances, appears to strip EXIF data out of posted images, maybe to keep people from inadvertently doxxing themselves, as some cameras attach GPS EXIF data. People have inadvertently doxxed themselves in the past on Reddit and probably other platforms. So end users here don't get to actually see the embedded source to generated images, even though both Automatic1111 and ComfyUI do their best to include it. While I don't post images on civitai.com, I suspect that the way they get all of their image metadata is by inspecting that EXIF data about the generation process from posted generated images...but that's not an option here.

I've generally tried to just manually post it, which I think is good practice.

For posting ComfyUI JSON workflows, there are two issues.

  • First, compared to the text that Automatic1111 provides, the JSON is quite verbose. It's pretty-printed, which makes it easier to read, but even longer. Trying to post it verbatim in the text after each image would take up a great amount of screen space for users here.

  • Second, while it's readable, most of what an end user is probably often interested in is just what the prompt and maybe the model used is. Digging through a lot of JSON for that is a pain -- it's readable, but certainly not as much as Automatic1111's summary.

What I've settled on -- and what I think is a good convention, if you're using ComfyUI and want to share the "source code" to your image -- is to post the model name and prompt manually beneath the image, since this lets someone just skimming the image quickly see and understand the basic settings used to generate the image.

I then also save the workflow from ComfyUI and generate a compressed, Base64-encoded version of that text, and attach it in spoiler tags. Lemmy, mbin, and AFAIK piefed and sublinks do not have the ability to do file attachments to posts, so this is probably the closest analog presently available to embed a small file in a post.

I use xz (which does LZMA compression), as it's an open-source compressor widely-available on Linux systems; LZMA is one of the "tightest" of the mainstream compression mechanisms. While it's far from the fastest, the workflow files are tiny, so speed does not matter.

That generates binary output, which cannot be posted. I then Base64-encode them, which generates a text string using characters that can be posted here. Various mechanisms have been used for this on Usenet in the past, including uuencode and yEnc, so making use of that existing work makes sense. I think that Base64, while not as compact as yenc, is the most-widely-used today, and I know that the restrictive character set used doesn't smack into any Markdown encoding issues, so went with that.

I've written a shell script that can do this automatically for other folks using Linux. This will take an existing workflow file, xz-compress it, Base64-encode it, prefix each line with four spaces so that the text is displayed literally by Markdown and not wrapped at each line, and then wraps the text in spoiler tags, so that in the Lemmy Web UI and any other clients that support it, the text is collapsed by default, as well as labeling the text with the filename. If anyone else is interested:

mylemmybinaryencode.sh#!/bin/bash

# Encode a file such that it can be pasted into a Lemmy markdown comment.
# Depends on coreutils, sed, and xz-utils.
# Usage: mylemmybinaryencode.sh <filename>

filename="$@"

echo "
spoiler $filename.xz.base64" xz <"$filename"|base64 -|sed "s/^/ /" echo ":::"

:::

Or, packed by itself:

mylemmybinaryencode.sh.xz.base64/Td6WFoAAATm1rRGBMDiAZkCIQEWAAAAAAAAALq0iivgARgA2l0AEYhCRj30GGqmZ696n29pZ/wy PqoDRKrP1e/xkfKsvL3J6/JBbESIPdita8Z9IMCRYuI3nDfnFrBIvwtRBCG5J+fDj7GChWZjfgeA kL5tWCWlAcpEnmNTJMlyQTDSK6iLBMF5ZaJvRY9t9iLbcg43dsdZNzeLULqatpbJe1mCZXSW4v6w +lPm/welW7rbmCnsLrN0jnxc97O/hOlwp9UgtdMD0sc1Z5n9oghIQCi7NfD0mxwGSoerSr4SI2LI FkL+X6CrJc8zSvDY5PPs5DvSXp37EboLt9KwG24AAACwUZBzqi5LkwAB/gGZAgAARP9hErHEZ/sC AAAAAARZWg==

On Debian-family Linux systems (most, these days), you'll need to have the xz-utils, sed, and coreutils packages installed. I suspect that pretty much all Linux systems out there already have these pre-installed, as they're very common utilties.

To decode such a file, one can do:

$ base64 -d <filename.xz.base64 | xz -d >filename

I'm sure that there are utilities on other platforms that can also handle Base64 and xz-compressed data (I've several other packages installed on my own), but that uses probably the most-widely-available utilities to do so.

Just wanted to provide at least what I do when posting images, and explain why.

top 1 comments
sorted by: hot top controversial new old
[–] aard@kyu.de 1 points 3 weeks ago

Also worth mentioning is that there's a plugin for Krita which allows both generating and inpainting from inside krita. Especially for inpainting you can get incredible results by combining with proper selections from inside krita.