This checkpoint was trained on 1.4m images of normal to hyper sized anime characters. It focus mainly on breasts/ass/belly/thighs, but now handles more general tag topics as well. It's about 60% anime and 40% furry images as of v8. See change log article below for more version details.
The OG hyperfusion LoRA can be found here https://civitai.com/models/16928
Alternate HuggingFace link for these models
This space is for the Checkpoint only since Civitai doesn't let you have both under one model space :/
recommendations for v8:
negative: low rating, lowres, text, signature, watermark, username, blurry, transparent background, ugly, sketch, unfinished, artwork \(traditional\), multiple views
cfg: 8 (it needs less than LoRA hyperfusion) resolution: 768-1024 (closer to 768 for less body horror)clip skip: 2
styling: Try the new artist tags included in v8, all tags can be found in the tags.csv by searching for "(artist)"
Because hyperfusion is a conglomeration of multiple tagging schemes, I've included a tag guide in the training data download section. It will describe the way the tags work (similar to Danbooru tags), which tags the model knows best, and all my custom labeled tags.
For the most part you can use a majority of tags from Danbooru, r-34, e621, related to breasts/ass/belly/thighs/nipples/body_shape.
The best method I have found for tag exploration is going to danbooru/r-34 and copying the tags from any image you like, and use them as a base. Because there are just too many tags trained into this model to test them all.
If you are not getting the results you expect from a tag, find other similar tags and include those as well. I've found that this model tends to spread its knowledge of a tag around to other related tags. So including more will increase your chances of getting what you want.
Using the negative "3d" does a good job of making the image more anime like if it starts veering too much into a rendered model look.
Ass related tags have a strong preference for back shots, try a low strength ControlNet pose to correct this, or try one or more of these in the negatives "ass focus, from behind, looking back". The new "ass visible from front" tag can help too.
...more tips in tag docs
2023/11/02: Next up I've actually managed to grow the dataset to 1 million images! But training it is going to take a while, so expect a long break between updates like usual. I still want to try this dataset on SDXL, but I think it will take 3-4 months to train at this scale :( So who knows.
This model took me months of failures and plenty of lessons learned (hence v7)! I would eventually like to train a few more image classifiers to improve certain tags, but all future dreams for now.
As usual, I have no intention of monetizing any of my models. Enjoy the thickness!
-Tagging-
The key to tagging a 500k dataset is to automate it all. Start with the wd-tagger (or similar danbooru tagger) to append some common tags on top of the tags scraped with the images from their source site. Then I trained a handful of image classifiers like breast size, breasts shape, innie/outie navel, directionality, motion lines, etc..., and let those do the tagging for you. Finally convert similar tags into one single tag as described in the tag docs. This helps with low count tags especially.
If only there there were a r-34 or e621 style tagger available.
I used this to train my image classifiers
https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification
Ideally I should train a multi class-per-image classifier like the Danbooru tagger, but for now these single class-per-image classifiers work well enough.
-Poor Results-
For a long time I was plagued with sub par results. I suspected maybe the data was just too low quality, but in the end it just ended up being poorly tagged images. Sites like r-34 tend to have too many tags describing an image like "big breasts, huge breasts, hyper breasts" all on the same image. This is not great for a model where you want specific sizes. Using the classifiers I mentioned above I limited each image to a single size tag for each body part, and the results were night and day.
2023/08/13 Coming back to this after even more experience in labeling/training, I still agree with the above statement. As I tag more and more images, the model becomes more reliable with prompts. You can see this clearly with the new bottomheavy, topheavy, bellyheavy tags. It makes it easier to generate specific body types, and helps the model understand what you want from your prompt. I didn't have to add additional images to make these tags work. I just improved tagging.
-Tag Bleeding-
An example of tag bleeding is using the tag "gigantic breasts", but you end up with everything being gigantic, breasts, ass, thighs. It's been an annoying problem, and the only (partial) solution that I have found is to segment the size tags for each body part. Like using "huge", vs "size 200" (which is not used). This way they don't share the same "huge" size tag.
I trained a model earlier with this tagging scheme and it did work a little better, but there was still some tag bleeding. I think this is because you often have images where the character has both large breasts and a large ass in the same image, so both concepts end up bleeding into each other regardless.
The reason I chose not to segment the size tags was because it would make prompting more difficult since it deviates too much from the danbooru/r-34 tagging schemes.
I wonder if tag bleeding is just a result of under training the model?
-Testing-
In order to determine if a new model is better than the last, it's important to have some standard prompts that you can compare with. x/y plot is great for this. Just keep in mind that the seeds between models will be totally different, and you likely need to compare dozens of images at a time and not 1 to 1. It's also important to compare new models against the base model output to make sure what you are training is actually having an overall positive effect compared to the original model.
2023/08/13 The hardest part about testing is trying to determine when you have overcooked your text encoder. At some point the models ability to interpret the prompt starts to degrade when training the text encoder at a high enough LR. I've seen it happen with smaller and larger models alike. Maybe I should stop training at some epoc and re-start without TE enabled. I've tried training without the text encoder many times, but the results are always sub par. Concepts that are foreign to the base model, are so much better understood with TE enabled in training.
-Software/Hardware-
The training was all done on a 3090 in an Ubuntu docker instance. The software was Kohya's trainer using the LoRA network with conv_dim and lots of patience.