This is the LoRA for realistic pink piggy.
I use Koyha_SS GUI to train my LORAs.
Some Key Settings:
"max_resolution": "768,768"
(if your GPU is really powerful, using 1024,1024 will be even better)
"learning_rate": "3e-6",
"lr_scheduler": "cosine_with_restarts",
"lr_warmup": "5",
"train_batch_size": 4,
"epoch": "5",
(May 15 2023: "epoch": "50" with file name 2_xxx)
"save_every_n_epochs": "40",
"mixed_precision": "bf16",
"save_precision": "bf16",
"enable_bucket": true,
"use_8bit_adam": false,
"text_encoder_lr": "1e-5",
"unet_lr": "5e-5",
"network_dim": 128,
"clip_skip": 2,
"network_alpha": 64,
"persistent_data_loader_workers": true,
"optimizer": "Lion"
If your hardware does not support my setting, tried to reduce the batch size to 2 or 1, this will increase the training time dramatically. If you want training time to be shorter, using epoch 2 will solve that.
Personally, I use epoch 5 because my hardware supports that and I think a higher epoch might capture more details from the source, but it is not necessary.
Image selection:
50 high-resolution (at least 1024*1024 before cropping) images with around 25 close-up and 20 upper-body shots and 5 full-body images. The more angles of the images the better.
Preprocess:
Auto focal crop in webUI with cropping dimension being at least 768*768 and up to 1536*1536.
Integrate with BLIP and Deepdanbooru for caption, and also use the stable-diffusion-webui-wd14-tagger extension for further captions. Remember to select append on existing captions and also select Remove duplicated tag
Then I use BooruDatasetTagManager to manually change the captions to my desired ones.
Image file name:
50_xxx
(This means each image in the image file is repeated 50 times)
May 15 2023: 2_xxx but with epoch increase to 50
Then you should be ready to go!
However, because of source image selection, you might need to give a trial on the resulting LORA and modify the source image several times until you get the LORA you are satisfied. Also, because everyone's hardware could be different, if you can not run 768*768 with batch size 4, try 2 or even 1.
I think the LORA result is really sensitive to the source images, so the image selection and tagging are the actual important part. Something you could tell what is the “bad” source image by trying out the LORA, and to improve it, you simply delete that "bad" image from the source. And sometimes you find some problem with the LORA suggested you need to modify the tags. For example, always generate a "wet" effect suggesting that you might not have “wet, rain, pool” tagged on a related source image.
Check your usually used checkpoint and see what it is based on, then select your base model based on that. good luck training.