Easier Typed than Done

Generative AI, like ChatGPT, Stable Diffusion, and MidJourney, is here and just about anyone can use it. Sounds simple enough, you just enter some words, like “cat riding on the back of a flying dragon” and you get an image. But… it might not be as easy as you think once you see the results. Especially if you had something specific in mind. But there are some tricks that can help.

In my attempts to get acquainted with Stable Diffusion, I have tried using it for a variety of purposes, from making random pics for my blog articles, to fan art, character design, and even birthday gifts. While it sounds easy, it isn’t actually as easy as it appears to be. I am not an expert by far, but I have played around with it enough to have identified several of the failures and shortcomings that accompany the current generation of AI art generators. Everyone knows about the terrible eyes and the wonky hands, but determining textures, consistency, and even multiple subjects can be just as difficult to deal with.

I’ve been experimenting with Generative AI for a little while now, and I have gotten some pretty amazing results. That being said, while it is really easy to type in a few words and get a surprisingly good picture from it, (if you ignore the implications of theft involved in using models built on other artists’ work) using it to make something specific is a lot more difficult than it sounds. The more prompts and negative prompts (prompts being the words you want the AI to use, and negative prompts being the words you want the AI to avoid) the more likely you are to get something unpredictable or random. Sometimes fewer prompts are better, which is part of the problem. One Stable Diffusion user online said that fewer prompts give him better results, but after looking at his prompts, it was clear he didn’t have an image in mind and was just letting the AI do the heavy lifting. If I don’t specify what exact subject, pose, lighting, style, background, etc. I want and instead let the AI do all of that, is it really my work? Directing AI to do something very specific can be very difficult. But possible.

Starting Promptly

First, there is a recommended minimum you should use for your prompts. At the very least you should cover the Subject, the Medium, the Style, and the Quality. There’s always more you can add, like Mood, Website associations (like ArtStation), specific Artists, Colors, Lighting, and Themes. All of the prompts you add are put into blocks of 75 prompts at a time and it will add more blocks as you add more prompts, so you don’t really need to think about it. However, when you start getting high enough, it can start taking more RAM and more time, so be aware of that.

Equally important to the prompts are the Negative Prompts, which work by the same rules but remove things from your images. Usually, these are needed to avoid bad anatomy, quality, and composition, to name a few. There are some you will always want to include, or at least some variety of them. The most popular ones I’ve seen include:

ugly, poorly drawn hands, poorly drawn feet, poorly drawn eyes, poorly drawn face, tiling, out of frame, disfigured, extra limbs, extra digits, missing limbs, missing digits, text, signature, watermark, bad anatomy, cut off, underexposed, overexposed, bad quality, low quality, bad art, worse quality, lowres, distorted face, amateur, blurry

Sometimes I will have more than 350 prompts and 250 negative prompts. But I don’t just add these willy-nilly. I think about which ones are effective, which ones aren’t working, which ones need more or less emphasis, and which ones are repetitive. And I don’t stop there. If I think it is getting too complicated, I will delete most, if not all, of my prompts, and try again with a new set of prompts.

There are specific prompts you can use that work better than others. If you want it to look highly detailed, you could use a prompt like “high detail”, but more effective is something like “8k wallpaper, extremely detailed.” While that doesn’t always work, you tend to get better results with specific prompts than with others. Knowing what works and what doesn’t isn’t just a matter of looking up available prompts, or copying what other people used, because there is no set list and everyone is just BS-ing it right now anyway.

Also, changing your “Checkpoint” or “Model” can have very different effects on the same prompts. While in most cases what you get will be pretty similar, there are models designed for specific uses, like ones meant to design robots, that will give you very different results. But sometimes you’ll find a model that is intended for one thing but actually be incredible at something completely different. One, in particular, I find hilariously ironic is one Checkpoint designed around making artistic nudes is actually amazing at producing realistic cloth.

Weight Limit

But what happens when the prompt works a little too well, or not well enough? That’s when you can resort to Weights. That’s right nerd, do you even lift? Not that kind of weight. You can assign a weight to the different prompt keywords to make them more or less intense. There are different ways of assigning weight to your prompts. The most common I’ve seen is to add parenthesis () or brackets [] around your prompt, which you can do multiple times. You can also add a factor to it in the form of (prompt:1.5). That means you multiply the standard intensity of that prompt by 1.5. You can do this to increase or decrease the strength of any prompt keyword, but if you go too high it will start to mess things up. Each set of parenthesis () around your prompt multiplies your keyword by 1.1. So (((prompt))) is the same as saying (prompt:1.33). Brackets [] decrease the weight by .9 for each set. So [[[prompt]]] is equal to (prompt:0.73). Because the math is too confusing for me, if possible, I prefer using the factor method.

Stable Diffusion, Prompt Scheduling example, dragon to cat

You can even use these to blend your prompts, called “Prompt Scheduling”. It looks like [prompt1:prompt2:factor]. In this case, the factor is a normalized value between 0 and 1, and it determines how strong one prompt is versus the other. That way, they don’t compete with each other and let you blend the prompts together. You can assign weights to this as well, but this has more control over the two prompts acting as one. The only catch is the first keyword you use determines the composition, and as the “steps” are performed, it switches from one to the other. Don’t be afraid to experiment with these, adding weights or blending prompts.

In addition to weights and scheduling, the order of the prompts is also important. The earlier the prompt comes, the more important it is to the image. Just moving a keyword around will have an effect on the image. While often this is more subtle, it can be drastic, and it can make enough of a difference to fix or break an image.

The CFG Scale, or “Classifier Free Guidance Scale”, is a simple tool that allows the AI to diverge from your prompt and allow more “creativity”. The higher the number, the more uniform all the images will be and the more restrained. The lower, and the more random and sporadic. Too low, and the image looks soft, abstract, and falls apart. Too high and it becomes harsh, contrasty, and surreal. You have to find a balance, and usually lower numbers are better. I personally prefer a setting of around 11, but changing this value can give you very different takes on the same thing.

If you want to learn more about Prompts, here’s a handy guide: Stable Diffusion Prompt Guide.

Just A Sample

What’s an AI without something technical in the middle of it? That’s what the Samples are. In fact, this is one of those things most people just ignore because it’s too technical for them. And I don’t blame them, I’m one of them normally.

But that doesn’t mean you should ignore it entirely. There are basically 2 aspects to sampling: the Method and the Steps. If you are familiar with how a 3D scene is rendered in something like Maya or Blender, then you would know that the more samples, the more passes over the image it makes, and the better it looks. That’s kind of what the Steps here are doing, but the end result can vary more than you’d think. As a general rule of thumb, with Generative AI, more samples are not always better. Often, you can use numbers as low as 8 samples and still get amazing results. And sometimes you can process something with 150 samples and not see any improvement at all. I typically set my samples around 32, unless I’m experimenting, in which case I may lower it for faster results. If I find an image I really like, adding more samples can sometimes improve it, or fix some of the problems, but it doesn’t always. Playing with this value is not a bad idea, but only once you have something you already like and have locked in all the other values.

The Sampling Method is the big question mark. Nobody knows what they are or what they do.
Ok, that’s not true, but I don’t know, yet, and it’s very technical. Most people assume certain sampling methods are better for certain styles, but I haven’t seen anything to back that up. Basically, these are the way the Model interprets your prompts and what to do with them. Changing this will have drastic changes to your image. While it’s a good idea to experiment, randomly switching after you have something you like won’t give you predictable results.

There are lots of sites that compare the differences between them, but most of them are not really that useful, since they can alter things so much and their explanations aren’t very good most of the time. However, there are a few that other people have recommended to me that I think are worth looking at. Euler (pronounce “oiler”) is the most commonly used one from what I’ve seen. It’s fast, reliable, and works well with few steps. Euler A is a sister sampler that has similar qualities but gives far more creative, and often unpredictable results. DDIM is another fast one that gives good results. I often switch between these three. LMS is another common one that runs fairly quickly and handles high samples well. DPM2 is another very popular one that is said to be one of the best but tends to be very slow. There are many varieties of this one and I don’t know the differences between them. It isn’t the best for experimenting with, thanks to the long render times, but some people swear by it.

The Magic Seed

The Seed is literally just a random number. Most of the time when you are generating images, the seed is set to something like -1. That’s just the program’s way of saying it’s set to a random value. That value is completely arbitrary and doesn’t have any specific meaning, but is also incredibly important. The seed is the secret to getting that specific image. If you found an image you like, lock in the seed and make changes to it. By doing that, the image will stay more-or-less the same as you alter the other values and prompts. However, alter them enough, and you will still get very different results.

As I said, the value of the seed is not actually important until you find the image you like. Moving the value by just 1 will result in a completely different image. There is no rhyme or reason to it that I can explain. But if you can reuse the same seed, you can get a better idea of the changes you make in other places and keep mostly the same composition. The important thing to remember here is that the seed value is what determines which image you are getting and if you lock it in, you can fine-tune your image.

NetWorking It!

AI Fan Art of Lum from Urusei Yatsura (2022), made in Stable Diffusion

Another way to add more control to what you are making is to use “Extra Networks”. These are user-made files that can do specific things to your image. Loras, Hypernetworks, and Textual Inversions are the typical extra networks you can download and use. They tend to be a lot smaller than the Checkpoints and load faster but also work with the model. They basically change what it’s doing in specific ways. There are Loras that will make everything look painted, and Textual Inversions (also called “embeddings”) that can change your subjects to look like a specific character. There are a ton of options! If you want to make fan art of Lum from the 2022 relaunch of Urusei Yatsura, there’s a tool just for that. In my admittedly limited experience, if you have a specific image in mind, start with what extra networks you think you will use, and work up from there. If you need to add one, do it early, or be willing to accept that things may change drastically.

Beyond that there are Extensions. These are more advanced plug-ins that let you do some pretty crazy stuff. Some of them will let you change the pose of the character in very exact ways including hands, and others will let you select specific colors for specific parts of the image. While these are more difficult to make, because they involve actually programming them, they give you a huge amount of flexibility when it comes to making your image match what’s in your head.

The most common of these is “ControlNet” which lets you use images to make specific changes, like poses or depth maps. One I have found very useful is “Latent Couple” which will let you specify regions of the image, by grid or by masks, to draw specific things in the same scene. This is excellent for multiple subjects, although it can be a lot of work to understand how to make it function because it requires specific prompts and prompt organization to work properly. It works best if coupled with an accompanying program (called “LatentCoupleHelper”) that allows you to visually specify where on a grid you want your subjects.

Created in Stable Diffusion for *“8-Bit Bootup (WT)”*

Impress with Inpaint

Once you have gotten an image close to what you want, you still have more options. While most people might be tempted to bring their picture to Photoshop or Kitra and fix up the little mistakes in it, there are actually tools that will help improve your existing AI image. A feature called “Inpainting”, or “img2img”, will let you use all these various tools to make repairs or changes to your image in different ways. In one method, you can draw a mask straight onto your image, tell the AI what you want there, and it will do its best to change it. You can even use an image as a template for another image, like building a new character turnaround from an existing character turnaround. There are many ways you can correct your image this way, and some of the Checkpoints, Extra Networks, and Extensions are designed specifically to be used with inpainting.

The important thing to keep in mind is that the prompts, seeds, and everything else still affects how the inpainting looks. Keeping the same prompts for the style and lighting will help keep your changes looking consistent with the rest of the image. This tool is not limited to just the image you are currently rendering. You can use this on any image you have. Just upload the image into the interface and it will let you do the same changes, which could let you alter an existing image for better or worse. As great as it is, though, sometimes it’s just easier to take it into Photoshop.

Extras! Extras!

FINALLY, there is the Extras feature. This is really just another AI used to upsample the image. Because these images can wreak havoc on your GPU, or at least max out your VRAM, you usually can’t make them very large. There are a few ways to make them bigger. The first is to increase the size while you are making the image, using the Hires Fix. This will use an AI to start multiplying the resolution of your image while it is rendering it, which will theoretically give you more details in the final image. The only drawback is that this tends to use a LOT of VRAM and will often fail. If you can use this feature, it is likely to give you the best results, however, I have had a lot of difficulty with it and can only increase the image so much before I have to resort to other methods.

The second way is to use the upsampler after the image has been made. Just like with Inpainting, you can use any image on this to increase the size of your image without losing quality. At least in theory. Typically these do a good job without using up all your VRAM. And just like with the image-generating AI, the upsampling AI also has sampling methods, called Upscalers. Some of these are great for photography, others are better for illustrations, and others still are amazing at stylized anime. Depending on what your image looks like will determine which, or which combination, of these samplers you will want to use. Once you figure it out, you can increase it to a decent size and get a 4k (or more) image in seconds.

Easy Peasy, Right?

So that’s all it takes to make a good AI-generated image that you had any control over. Just pick the best Checkpoint model for your needs, use the right prompts, don’t forget the negative prompts, be sure to get them in the right order, add weights to them, choose your sampler method and steps, don’t forget to adjust your CFG Scale, make sure you know what seed you want or leave it random and render large batches until you find one you like, add any Hypernetworks, Loras, and Embeddings and don’t forget to add their prompts, double check your extensions and be sure to pick the right pose, composition, and color, and be careful not to overload your GPU and VRAM. Once you’ve done that, fix it up with Inpainting, and then increase the resolution for the final result. Super easy a caveman could do it, right?

To be fair, a lot of the interfaces out there, especially online ones, do all the heavy lifting for you, but that also means they take away most of the control from you. If you want to get a specific image, it will be much harder when you don’t have the control you need. Basically, there are 2 types of Generative AI producers, the people that put the minimum effort in and get something that looks cool, and the people that put a lot of effort in and get something that kind of sort of looks like what they were going for. When placed into the hands of a creative, AI as an art tool is not that much better than any other tool, and in some ways worse. Sure you can get something quick and easy, but to make it your own, you have to put in the work, and that can take a long time.