this post was submitted on 07 Jul 2024
48 points (88.7% liked)

AI Generated Images

7151 readers
227 users here now

Community for AI image generation. Any models are allowed. Creativity is valuable! It is recommended to post the model used for reference, but not a rule.

No explicit violence, gore, or nudity.

This is not a NSFW community although exceptions are sometimes made. Any NSFW posts must be marked as NSFW and may be removed at any moderator's discretion. Any suggestive imagery may be removed at any time.

Refer to https://lemmynsfw.com/ for any NSFW imagery.

No misconduct: Harassment, Abuse or assault, Bullying, Illegal activity, Discrimination, Racism, Trolling, Bigotry.

AI Generated Videos are allowed under the same rules. Photosensitivity warning required for any flashing videos.

To embed images type:

“![](put image url in here)”

Follow all sh.itjust.works rules.


Community Challenge Past Entries

Related communities:

founded 1 year ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] j4k3@lemmy.world 2 points 4 months ago

Every character is important in AI, including the spaces between words and punctuation. "Womens" is not a word in English. Women is already the plural form of woman. There must be 's to denote the possessive ownership.

In generative AI, the tools to monitor the tokenized model input are more challenging to view as these tools are not integrated into Automatic1111 or ComfyUI by default like how the feature is integrated into Oobabooga Textgen for LLM's. Monitoring the tokenized input for the model would show how the word was either omitted entirely or was broken into the simplified single letters, or at least that is how LLM's do tokenization.

You should always keep in mind that every word and style you use in a prompt, must correlate with tags that were trained with the image. Many models are trained with natural language sentences, so they have some degree of natural language processing. It is not complex in the same natural language processing as a text to text model where there are complex special tokens that connect the input to the output.

The way tokens are processed is a major aspect of the evolution of generative AI. For instance, the first stable diffusion 1.x models use CLIP G, which is a very small language processing model. The SDXL models use a dual processing setup with CLIP G and CLIP L used in tandem. The last Stable Diffusion model, SD3, uses a triple processing setup that uses G, L, along with a full T5xxl text to text large language model. I haven't gone super in depth trying to understand the codebase from SD3, but there is something weird happening with the T5 where SD3 is swapping an entire tensor layer each time the model loads instead of shipping a pretrained model or using a LoRA layering scheme. Safety with generative AI is different from LLM's. It is not part of the model in the same way that safety works for a LLM. I found it fascinating how SD3 omits human genitalia and started looking into the code for ComfyUI as a result because this behavior is deterministic and therefore not part of the actual tensor tables maths. The behavior centers around the T5 model... Anyways, I'm getting stupid technical on a tangent... What I meant to say is that the text processing and tokenization of the model is external to the tensor tables of the actual generative model. If the processing scheme is complex enough, it might be possible to error correct the prompt, but it is best to assume that the prompt will be exactly as it was submitted.