๐ SuperMix is an Anime focused Text-to-Image diffuser model capable of bringing out semi-realistic tones through detailing, lighting, textures, and other aspects of the composition. At the same time, this merged model is very versatile with the amount of styling, forms, and mediums you can choose to generate with outputs through chosen parameters. SuperMix is great with:
Portraits
Anime
Semi-Realism
Scenery
Concept Art
Detailed Textures
Detailed Backgrounds
Vehicles, Architecture, Food
& More!
This mix started out as a spontaneous combination of various anime focused models. I took note of some of the details the merge had excelled at - then decided to create a mix highlighting those aspects continuing from there. After some iterations and branch tests, I decided this mix was decent enough to share with others as is without going too far with variations.
I still consider myself newer to generated art and such in general, so if you see anything to be corrected or to improve upon, let me know ๐
I would love to see what people create with the outputs of this model, feel free to use the tag #SuperMix on various platforms if you decide to post anything!
Note
SuperMix1 is an older rough-merged model mixed at the end of 2022 from various models known at the time. As such, this model and merge-components are fairly dated and may be harder to manage at times with current webUI updates etc. There are many great models available now with similar styles and flexibility that may be easier to use depending on your style preference. If this model receives any future updates, any new version will be geared at ironing out any prevalent issues in this version, removing any license limitations, and finetuning to a better standard.
๐ก Alternate versions, model recipe(s), and more information can be found on the Hugging Face page.
This model is fairly versatile when it comes to a general use configuration and parameters.
In short, I would suggest starting simple and experiment with what works best for you and your prompt at the time. Perhaps try some older prompts and configurations, an example, or start from scratch and go from there. This model really shines with a good prompt. You may experience some messy anatomy/hands until you find a good configuration + prompt, you'll know when you do. Keep in mind this model is geared more toward portrait style generations.
There are many different examples of various configurations used in the previews and example sections - feel free to explore your own styles. SuperMix can be a very powerful model capable of many different styles, don't be afraid to use this model(s) the way you find best.
An additional img2img upscale at lower denoising values can do really well and really bring a clean polish to output images. Keep in mind you may lose some very fine detailing depending on your parameters, though you can also merge two upscales together for the best of both ranges.
~20 steps, 7 scale, ~0.4 denoising, clip skip 1 or 2, is a good starting point
This model has some issues with detail accuracy, however is primarily geared toward portrait styles. Keep in mind hands/anatomy may be a bit wild at times depending on your prompt and other parameter factors.
SuperMix can excel with both simple and complex prompt styles. Start simple and small, then expand from there. ๐ Prompts are king in my opinion when it comes to one of the largest factors in a generation. Be keen about what you're using and how you're using it; what may conflict with something else; and how everything plays together with other parameter factors. (ie sampler, steps, scale, clip skip, seed, lora, etc.
Note: artist tokens can hold a lot of weight in outputs, use at your own discretion.
Positive Prompts: Simple goes a long way as a starting point but you can really direct the model style-wise with some added structure. Try anything you find that works well with your other parameters. Here are a few starting points.
(masterpiece:1.1), (highest quality:1.1), (HDR:1.0)
extreme quality, cg, detailed face+eyes, (colorful:0.8), <content>, masterpiece, 8k, tone mapping, hyper focus
Negative Prompts: This model can do well with a simple negative prompt, a negative embedding(s), but can also do really well with some structure to the negative prompt as far as styling direction, undesired quality, etc. Keep in mind conflicting tokens with your positive prompt and otherwise and maybe not too too complex, but try anything that works!
(bad quality:1.3), (worst quality:1.3)
EasyNegative, (bad_prompt_version2:0.8), (badhandv4:1.18), (bad quality:1.3),
(worst quality:1.3), watermark, (blurry), (cropped), (nsfw:1.3), (cleavage:1.3)
You can check the preview images for more examples.
Hires Denoising: I tend to do a range between ~0.3-0.6, I haven't really tried much else so far though. Experiment to see what works best for your parameters and prompts at the time.
Hires Upscaler: Upscalers seem to produce slightly different results between themselves - though I find any of them seem to work. I'm not sure what is typically used, though I mainly use R-ESRGAN 4x+ Anime6B or 4x-UltraSharp. Use what you think is best as always.
I suggest starting with ~18-30 step values, you can go lower or higher and see what works well with your prompt, sampler and other parameters.
Most of my tests with this model were using samplers:
Eular a
DPM++ 2M Karras
DPM++ SDE Karras
DDIM
I also tried a bit of DPM++ 2S a Karras, and PLMS samplers.
I am unsure on the rest. Each sampler has their own styling and play differently with your prompt and other parameters at the time.
I suggest trying out what you typically use, then try out some of the others and see how they play with your other configurations and prompt.
Do note that some samplers may make use of certain terms/tokens of your prompt and other parameters differently than others. You may find better results with one sampler and "prompt a", then better results with another sampler and "prompt b" etc.
CFG Scale may largely be dependent on your prompt, sampler, etc. Though, I generally suggest starting at default 7 and adjusting from there -> ~6.5-10
I have had good results with higher scales ~13-16 on samplers such as DDIM for example, depending on the prompt and other factors used. This is not to say a lower value does not work as well also. The same can be said for other samplers and value ranges.
Experiment and see what works best for you and your prompt ๐
Clip Skip 1 - great with most samplers, especially Euler a in my experience.
Clip Skip 2 - also great with most samplers, tends to be more 'literal' with various tokens in a prompt depending on sampler and other parameters.
Both work great and will each produce different styles and results - this is part of the reason I didn't go with some of the other test model variations due to the imbalance of quality between the two clip skip variations. I suggest trying both or even together in the same generation with the built-in X/Y/Z plot script.
You can always try higher as well, I have seen some good results with Clip Skip 3-6.
Use any VAE you prefer. I typically use vae-ft-ema-560000-ema.
"SuperMix_A.vae" (renamed SD vae-ft-ema-560000-ema.vae)
Recommended - bright vivid/bold colors
"SuperMix_B.vae" (renamed kl-f8-anime2.vae, found on the Hugging Face page)
Very Similar - different details at times
"SuperMix_C.vae" (renamed Anything_v3.vae, found on the Hugging Face page)
Another option - moderate colors/saturation in comparison
vae-ft-mse-840000-ema and ClearVAE_V2.3 can also be good options.
Note: model names containing "-bv" or "-bakedVAE" include VAE files baked-in making the use of these files no longer needed.
A secondary img2img upscaling after generation can really bring out clarity in images and iron out details with this model. Keep in mind this can also soften some texturing detail depending on your settings. This is not needed of course, but can really sharpen up some generations. Use the settings or extension(s) that work best for you.
I generally use the built in SD upscale script with:
the same base model
the same or similar prompt
DPM++ SDE Karras sampler
20 sampling steps
7 cfg scale
a low denoising strength ~0.08-0.3
a random seed, -1
tile overlap ~176-208
scale factor x2
upscaler R-ESRGAN 4x+ Anime6B or 4x-UltraSharp
loRa usually turned off
I've only used the webUI defaults:
0 Eta noise seed delta
0 Eta for DDIM (noise multiplier)
1 Eta for ancestral samplers (noise multiplier)
For the example images I used -> settings/compatibility/use old karras scheduler sigmas (0.1 to 10) compatability setting which effects karras samplers. This is completely optional and shouldn't be needed. This setting better replicates some of the older webUI versions. I have not personally tested enough with this setting turned off on the newer webUI versions.
Disclaimer
This model(s) may output NSFW content unintentionally depending on parameters used. Make sure you tailor your prompts accordingly. For example "nsfw" in the negative prompt.
The purpose of sharing this model is not to showcase obscene material in a public forum. The use of this learning model is entirely at the discretion of the user, and they have the freedom to choose whether or not to create SFW or NSFW content. The decision of whether to engage with SFW or NSFW content lies with the user and their own personal preferences. The ai model(s) do not contain explicit visual content that can be accessed easily.
vae-ft-ema-560000-ema vae
4x-UltraSharp or R-ESRGAN 4x+ Anime6B upscalers
hires fix w/ some upscaled again using SD upscale
use old karras scheduler signmas (0.1 to 10) compatibility setting
any LoRa use should be listed in the image metadata
Bad_v2 negative embedding renamed should be the same as bad_prompt_version2
EasyNegative and badhandv4 negative embeddings were also used
*Negative embeddings are optional
Note
SuperMix1 was originally merged and tested on a much older version (https://github.com/AUTOMATIC1111/stable-diffusion-webui/tree/4b3c5bc24bffdf429c463a465763b3077fe55eb8) of Automatic1111 WebUI. Due to this, I suggest enabling -> settings/compatibility/use old karras scheduler sigmas (0.1 to 10) compatability setting when using karras samplers or are trying to recreate some example images. This is completely optional and shouldn't be needed - I have not personally tested enough with this setting turned off on the newer webUI versions.
This model is open access and available to all, with a modified CreativeML OpenRAIL-M license further specifying rights and usage.
1. You can't use the model to deliberately produce nor share illegal or harmful outputs or content.
2. The authors claims no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in the license.
3. You may re-distribute the weights. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the modified CreativeML OpenRAIL-M to all your users (please read the license entirely and carefully).
Please read the full license(s) Stable Diffusion and Dreamlike Diffusion 1.0.
(RT variant is not subject to Dreamlike Diffusion license)
Use Restrictions
You agree not to use the Model or Derivatives of the Model:
- In any way that violates any applicable national, federal, state, local or international law or regulation
- For the purpose of exploiting, harming or attempting to exploit or harm minors in any way
- To generate or disseminate verifiably false information and/or content with the purpose of harming others
- To generate or disseminate personal identifiable information that can be used to harm an individual
- To defame, disparage or otherwise harass others
- For fully automated decision making that adversely impacts an individualโs legal rights or otherwise creates or modifies a binding, enforceable obligation
- For any use intended to or which has the effect of discriminating against or harming individuals or groups based on online or offline social behavior or known or predicted personal or personality characteristics
- To exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm
- For any use intended to or which has the effect of discriminating against individuals or groups based on legally protected characteristics or categories
- To provide medical advice and medical results interpretation
- To generate or disseminate information for the purpose to be used for administration of justice, law enforcement, immigration or asylum processes, such as predicting an individual will commit fraud/crime commitment (e.g. by text profiling, drawing causal relationships between assertions made in documents, indiscriminate and arbitrarily-targeted use).
- To generate NFTs
Terms of use
- You are solely responsible for any legal liability resulting from unethical use of this model(s)
- If you use any of these models for merging, please state what steps you took to do so and clearly indicate where modifications have been made.
Note
If you see any conflicts or corrections to be made, please let me know.