Introducing Plane Helper! A multi-concept LyCORIS (trained as a LoHa) for v2.1. Plane Helper is trained on around 2300 images of 71 different planes. I would like to share my training process for anyone interested in learning about training a multi-concept LoHa on complex concepts. This is by no means meant to be scientific, but to help others who might be facing some of the same roadblocks I did.
Dataset:
I've created a data table that contains all 71 plane types in this model along with their respective key tokens which can be viewed here:
https://docs.google.com/spreadsheets/d/1N7o4pc9mGyYSYIoeD4fU_WpO_2tL_JfKDETOJPVQw18/edit?usp=sharing
The project started after realizing how bad 2.1 was with planes. They would always come out with wings in the wrong places, or extra wings, and not very detailed. Sometimes nothing more than a blob of pointy geometric shapes. I aimed to do something about it. So I started out by grabbing about 130 images of cool looking planes and running them through a batch interrogator for captioning. I trained with Kohya and realized no matter how much I trained, I would still get the same issues. I felt the issue was I had too many different objects and was confusing the LoHa by not differentiating them and just calling them all planes, or jets. So I decided I was going to identify each type of plane and inject their names into their respective captions.
I didn't know much about planes, but I wanted 2.1 to be able to make awesome images of them. So the process of running each image into image.google.com and identifying them all was a tedious but necessary exercise. I ended up with 71 different plane types. So instead of picking 2 and moving on. I thought I would see what this LoHa type model was made of and continued to build my dataset with at least 30 images of each plane type and train them all into 1 LoHa. There were some plane types that I could not get 30 decent images of, however I decided to leave them in the dataset because the ultimate purpose was to train a model on planes in general and it might help to keep them in there. By the time I acheived this goal, I had about 2300 different images total for 71 different plane types all organized neatly in their own folders. Then came the captioning process.
After running these through batch interrogation using Captionr, I googled each plane type and made an excel table with the name and brief but detailed description of the aircraft. I then brought each folder into the Dataset Tag Editor extension one by one and proceeded to clean up the captions and inject the tokens from the excel table I made into each one. I decided to add a debug "CHV3CPlane" token and a "CHV3CTiltRotor" token for the tiltrotor planes (because they are very different shapes) into each caption as well. These debug tokens work great for making unique planes as well. I also allowed for weighted captioning so those debug tokens, along with the plane type identifier tokens were weighted. For example a caption might look something like this:
"(Hawker Hunter), transonic jet-powered fighter aircraft, Hawker Aircraft, Royal Air Force (RAF), Avon turbojet engine, swept wing, (CHV3CPlane), a plane is flying in the air, swiss, low level, man, viewed in profile from far away, switzerland, illustration, overhead".
After this long and patience testing process, I now had an acceptably decent dataset to begin training with.
Repeats:
I initially set the repeats for each folder to 5 but some plane types had a smaller amount of images in their dataset than others, so I did a rough balancing where the planes with a higher amount of images ran for 5 repeats, where as a set with few images might have ran for 10-15.
Training method:
I went about this project in a somewhat unorthodox way. For a dataset so big, I naturally had issues with overfitting. For LoHa a network dim of 8 is recommended but I changed over to 32 to allow for more training space. I went through a few phases trying to find the best settings that would allow me to train for more epochs before getting too burnt up. The original source model was v2-1_768-ema-pruned, however with each training run, I would merge the LoHa from the previous run into v2-1_768-ema-pruned and use that merge as the source model for the next run. The theory behind this is that it would gain some sort of extra knowledge from the previous run and give it a little bit of a head start. This theory needs more testing. Preferrably on a smaller dataset. After each training run, I would also merge the old source model that I merged with the new merged source model and then use that as the new source. I would also use Extract LyCORIS LoCON to extract a lora from it. This particular model is 4 LoHas from 4 different training runs (all on the same 71 plane dataset) merged together and extracted.
Training Specs:
I won't get too in the weeds with training specs. Instead I will post where I found the best settings to be for this particular project and point you to 3 other guides that go through everything in much better detail.
THE OTHER LoRA TRAINING RENTRY
LoRA Training Guide
RFKTR's in-depth guide to Training high quality models
https://civitai.com/articles/397
My Specs:
{
"LoRA_type": "LyCORIS/LoHa",
"adaptive_noise_scale": 0,
"additional_parameters": "",
"block_alphas": "",
"block_dims": "",
"block_lr_zero_threshold": "",
"bucket_no_upscale": true,
"bucket_reso_steps": 64,
"cache_latents": true,
"cache_latents_to_disk": false,
"caption_dropout_every_n_epochs": 0.0,
"caption_dropout_rate": 0,
"caption_extension": ".txt",
"clip_skip": "1",
"color_aug": false,
"conv_alpha": 1,
"conv_alphas": "",
"conv_dim": 4,
"conv_dims": "",
"decompose_both": false,
"dim_from_weights": false,
"down_lr_weight": "",
"enable_bucket": true,
"epoch": 20,
"factor": -1,
"flip_aug": true,
"full_fp16": false,
"gradient_accumulation_steps": 2.0,
"gradient_checkpointing": true,
"keep_tokens": "0",
"learning_rate": 0.0001,
"logging_dir": "",
"lora_network_weights": "",
"lr_scheduler": "cosine",
"lr_scheduler_num_cycles": "",
"lr_scheduler_power": "",
"lr_warmup": "10",
"max_data_loader_n_workers": "4",
"max_resolution": "768,768",
"max_timestep": 1000,
"max_token_length": "150",
"max_train_epochs": "",
"mem_eff_attn": true,
"mid_lr_weight": "",
"min_snr_gamma": 5,
"min_timestep": 0,
"mixed_precision": "bf16",
"model_list": "custom",
"module_dropout": 0,
"multires_noise_discount": 0,
"multires_noise_iterations": 0,
"network_alpha": 4,
"network_dim": 32,
"network_dropout": 0,
"no_token_padding": false,
"noise_offset": 0,
"noise_offset_type": "Original",
"num_cpu_threads_per_process": 4,
"optimizer": "AdamW8bit",
"optimizer_args":
"output_dir": "",
"output_name": "",
"persistent_data_loader_workers": false,
"pretrained_model_name_or_path": "",
"prior_loss_weight": 1.0,
"random_crop": false,
"rank_dropout": 0,
"reg_data_dir": "",
"resume": "",
"sample_every_n_epochs": 0,
"sample_every_n_steps": 0,
"sample_prompts": "",
"sample_sampler": "euler_a",
"save_every_n_epochs": 1,
"save_every_n_steps": 0,
"save_last_n_steps": 0,
"save_last_n_steps_state": 0,
"save_model_as": "safetensors",
"save_precision": "bf16",
"save_state": true,
"scale_v_pred_loss_like_noise_pred": true,
"scale_weight_norms": 0,
"sdxl": false,
"sdxl_cache_text_encoder_outputs": false,
"sdxl_no_half_vae": false,
"seed": "",
"shuffle_caption": true,
"stop_text_encoder_training": 0,
"text_encoder_lr": 5e-05,
"train_batch_size": 2,
"train_data_dir": "",
"train_on_input": false,
"training_comment": "",
"unet_lr": 8e-05,
"unit": 1,
"up_lr_weight": "",
"use_cp": false,
"use_wandb": false,
"v2": true,
"v_parameterization": true,
"vae_batch_size": 0,
"wandb_api_key": "",
"weighted_captions": true,
"xformers": true
}