我厌倦了传统 AI 绘画一成不变的脸、姿势、风格,所以想要脱离混合模型。最初,我使用提示词,可始终无法达成某种微妙的线条、色彩、光影、质感、构图或故事性,甚至无法复刻模型偶然产生的惊艳风格。这种昙花一现仅与一般风格有细微差别,却从美学上引人入胜。因此,我想制作一种能完美学习艺术风格并稳定输出的模型。我从 2022 年 11 月开始收集素材训练风格化模型,特殊打标以区分那些仅有细微差异的素材,终于于 2023 年年初在模型风格上自成一派,即 AIDv1.0 模型。
为什么不练 Lora 而要微调?我始终认为微调的效果要优于 Lora。它不依赖于底模,所有的训练图像在训练中共同向着误差最低点前进,而不仅是最优化一块附加权重。但我也在探寻能够将特定风格完美融入大模型的方法,以减轻训练负担。
此后半年里,我自费两万多元,亲自裁图、打标、魔改脚本。训练步数从几千,几万再到几百万,训练设备从 RTX3060,RTX3090 再到 A100。从制作素材再到训练,AID 也逐渐成为了架构完整的工程项目。
在这之中,我发现只有当模型轻微“过拟合”到原图像的噪点时,才能对风格有最佳的学习。我尝试过拟合所有风格,并使用负面 emb 学习过拟合噪点以平衡不同风格间的学习进度,由此制作了 bad, badhand 和 aid 系列。这种正则化方法为我带来了很好的结果。一个训练恰到好处的负面 emb 不仅不会破坏底模的风格,还能助长风格的特征。
随着模型迭代,我认为我逐渐达到了 SD1.5 的上限。即便是微调,那些精美插画风格独特的线条、色彩、光影、构图、故事性各具特色而难以简单的 SD1.5 模型很好地学习模仿。从欠拟合到过拟合,我始终无法得到完美的风格化特征,更何况模型同时需要最优化百种以上的艺术风格。
为此,我非常期待更加复杂的 SDXL 模型能为我带来新的突破口。
模型训练期间,我并没有将精力耗费在撰写大量提示词和混合不同风格上。有人搭配一些 Lora 和非常复杂的提示词得到了相当惊艳的结果,我非常感谢他们的创新和喜爱。
最后,感谢 @BananaCat 对本文的汉化,我很乐意与全世界的 SD 爱好者分享和交流成果。AID 模型均出于专业兴趣。如果您对更多素材处理和模型训练的工程细节感兴趣,或愿意与我分享您的训练方案,欢迎在评论区留言,我会第一时间回复。
AnimeIllustDiffusion (AID) 是一款预训练、非商用且多风格的动漫插画模型。它 不会生成“AI脸”。它内置大量风格,您能够使用一些 特殊的触发词(见 附录 A)来生成特定风格的图像。由于内置大量内容,AID 需要强烈的负面提示才能正常工作。一般的负面提示词(例如 low quality, bad anatomy 等)效果有限,因此,若您生成的图像中出现噪点,请搭配我提供的负面文本嵌入 [1] 使用,以消除噪声。对于版本特制负面文半嵌入,请 参阅版本信息。另外,我推荐 sd-vae-ft-mse-original [5] 这款 vae,它色彩明亮,非常适合插画风格。第 II 部分将简单地介绍 AnimeIllustDiffusionV1.0 的制作过程;第 III 部分将介绍负面文本嵌入;附录 A 部分将提供完整的关键词列表。
AID 模型拥有 超过 200 种 稳定的 动漫插画风格 和 100 名动漫角色。生成风格需要的特殊提示词见附录 A。生成角色则直接使用角色名即可。AID 模型像一个调色板,您可以通过任意组合提示词创造出新的风格。
采样器:Euler a
采样步数:40
分辨率:512x768, 640x960, 768x1152 等
CLIP 层跳过:1
提示词格式: best quality, masterpiece, highres, by {xxx}, best lighting and shadow, stunning color, radiant tones, ultra-detailed, amazing illustration, an extremely delicate and beautiful, {其他提示词}
负面提示词格式:aid210, {其他负面提示词}
ps: 其中,`{xxx}` 为风格名称。`aid210` 为模型特制负面文本嵌入。您可以从 [1] 的链接处下载并学习如何使用它。
每个版本的 AID 各有所长,并非越新的版本越好。
适合第一次使用:v2.8, v2.91 - Weak, v2.10beta1
有极佳创造力:v2.6, v2.7, v2.91 - Weak, v2.91 - Strong
较为稳定:v2.5, v2.6, v2.8, v2.91 - Weak
风格多样:v2.91 - Weak, v2.91 - Strong, v2.10beta1
这款模型由三个不同的模型融合而来,其中两个由我训练,另一个为 GoldSun 融合的 Pretty 2.5D 模型 [2]。
我使用 4300+ 张经过人工裁剪、打标、512x512尺寸的二次元插画图像作为训练集,使用 dreambooth 微调 Naifu 7G 大模型训练风格。我以较高的学习率为每张训练集图像训练了100期。我没有使用正则化图像。我训练了文本编码器。如果有兴趣,您可以在 [3] 处找到详细参数信息。
我使用 Merge Block Weighted 扩展融合模型。在三个模型中,一个模型被用于提供风格和文本编码器(base alpha 和全部 OUT 层),一个模型被用于优化手部细节(IN 层 00 - 05),另一个模型(Pretty 2.5D)被用于提供构图(IN 层 06 - 11 和 M 层)。
该模型推荐使用 badv3 —— 一个负面提示词的文本嵌入文件。它不仅能简化提示词的书写,还能激发模型潜力,提高生成图片的质量。通常,badv3 的效果已经足够,您无需再额外填写质量提示词。但它并不能解决 100% 的画面问题。
您应该将下载得到的负面文本嵌入文件,即 badv3.pt 文件放置在您 stable diffusion 目录下的 embeddings 文件夹内。之后,您只需要在填写负面提示词处输入 badv3 即可。
我的想法是训练了一个糟糕图像的概念,并把它放入负面提示词中以避免生成这类不好的图像。
我使用了几百张由模型生成的糟糕图像训练负面文本嵌入,即 badv3,其原理与 EasyNegative [4] 相似。我尝试把它训练到过拟合以缓解传统负面文本嵌入对模型画风的影响,这似乎很有效。
与 EasyNegative 相比,badv3 对本模型的效果要更好。我暂未对比其他负面文本嵌入。
badv3 是我继 deformityv6 后训练的第 n 个负面文本嵌入。其制作非常容易,但结果也相当随机。我曾尝试通过添加差分来从模型中移除另一个用糟糕图像训练的模型的权重,但目前并没有乐观的结果。我接下来打算训练负面 Lora 来代替负面文本嵌入以直接从模型中“移除”一部分权重而非“避免”它们。
本模型用于测试多风格模型训练,非盈利或商用,皆兴趣使然。若有侵权,立即删除。
所有封面图片均由文生图生成,没有使用任何Lora,在负面提示词中使用了 [1] 处的负面文本嵌入。
使用者仅被授权使用此模型生成图片,不允许未经同意的转载。
严禁将本模型用于一切商业用途。
附录 A 部分的展示图为本模型特殊提示词的大分类提示词参考,并非一定使用指定提示词才能生成。
请勿使用本模型生成带有血腥、暴力、色情的违规图片及任何侵权内容!因此,附录 A 部分仅能够提供部分经过训练的关键词。
I grew tired of the monotonous faces, poses, and styles produced by traditional AI drawing systems, so I wanted to break away from the hybrid models. Initially, I used prompts, but I couldn't achieve the subtle lines, colors, lighting, textures, composition, or storytelling that I desired. I couldn't even replicate the stunning styles that the models occasionally produced by chance. These fleeting moments had slight differences from the general styles, but they were aesthetically captivating. Therefore, I wanted to create a model that could learn artistic styles perfectly and consistently produce outputs in those styles. I started collecting training data from November 2022 and used special tags to differentiate styles from dataset with only subtle differences. Eventually, in early 2023, I developed a distinct style for the model, which became known as the AIDv1.0 model.
Why choose fine-tuning rather than lora? I have always believed that fine-tuning gives better results than Lora. It does not rely on a base model, and all training images progress together towards the point of lowest loss, rather than just optimizing an additional set of weights. However, I have also explored methods to seamlessly incorporate specific styles into large models to reduce the training burden.
Over the next six months, I have spent over $2500. I collected and labeled all the training data, and modified training scripts to meet my needs. The training steps ranged from thousands to millions, and the training devices grows from RTX3060, 3090 to A100. AID gradually evolved into a complete engineering project.
During this process, I found that the model learns the styles best when it slightly "overfits" to the noise. So, I attempted to overtrain all styles and used negative embeddings to learn the noise as a form of regularization, balancing the learning progress between different styles. This approach yielded good results for me. Well-tuned negative embeddings not only preserve the style of the base model but also enhance the features of the style.
As the model iterated, I believe I reached the limit of Stable Diffusion 1.5. Even with fine-tuning, the model struggled to imitate the contour, colors, lighting, composition, and storytelling of those great styles. From underfitting to overfitting, I could never achieve perfect stylized features, especially considering the model's need to optimize for over a hundred artistic styles simultaneously.
Therefore, I am eagerly anticipating more complex SDXL models to provide new solutions.
During training, I didn't focus on writing complex prompts or testing different styles. Some of my friends achieved impressive results by combining Loras with highly complex prompts, and I am grateful for their innovation and support.
Finally, thanks to @BananaCat for the localization of this introduction. I am happy to share my results with everyone. If you are interested in more engineering details of preprocessing training data or parameters, or are willing to share something with me, please leave a message in the comment area, and I will reply as soon as possible.
AnimeIllustDiffusion is a pre-trained, non-commercial and multi-styled anime illustration model. It DOES NOT generate "AI face". You can use some trigger words (see Appendix A) to generate specific styles of images. Due to plenty of contents, AID needs a lot of negative prompts to work properly. If you get noisy images (most case will be noisy) when generating, you need to use it with my negative text embedding [1] to cancel noise, which is crucial. Otherwise, you will get bad results. For VAE, I recommend sd-vae-ft-mse-original [5]. Part II of this introduction describes how the model was made; part III presents my proposed negative text embeddings; and Appendix A provides a partial list of keywords.
The model has over 100 stable anime illustration styles and 100 anime characters. See Appendix A for specific style trigger words. To generate a specific character, just use the character's name as prompt directly. The AID model is like a palette, and you can create new styles by combining different prompts.
Sampler: Euler a
Steps: 32
Resolutions: 512x768, 640x690, 768x1152, etc.
CLIP skip: 1
Prompts format: best quality, masterpiece, highres, by {xxx}, best lighting and shadow, stunning color, radiant tones, ultra-detailed, amazing illustration, an extremely delicate and beautiful, {other prompts}
, where by {xxx}
is the name of the style (trigger words in appendix A).
Negative prompts format:aid210, {other negative prompts}
, where aid210
is the special negative embedding which you can download and learn to use it from [1].
Each version of AID has its own strengths. The newer version is not absolutely better.
For beginners: v2.8, v2.91 - Weak, v2.10beta1
Great creativity: v2.6, v2.7, v2.91 - Weak, v2.91 - Strong
Relatively stable: v2.5, v2.6, v2.8, v2.91 - Weak
Various styles: v2.91 - Weak, v2.91 - Strong, v2.10beta1
If you'd like to upload and share your own images, or would like to contribute training images for future AID models, please move to:
anime-illust-diffusion-gallery - a Hugging Face Space by Eugeoter
This model is a fusion of three different models, two of which I trained and one is the Pretty 2.5D model fused by GoldSun [2].
I use 4300+ artificially cropped, tagged, 512x512 size anime illustration images as the training set, and use dreambooth to fine-tune the Naifu 7G model. I trained for 100 epochs per training set image with a high learning rate. I didn't use regularized images. I also trained its text encoder. If interested, you can find detailed parameter information at [3].
I merged 3 models using Merge Block Weighted to create this AnimeIllustDiffusion model. Among the three models, one model is used to provide style and text encoder (base alpha and all OUT layers), one model is used to optimize hand details (IN layers 00 - 05), and another model (Pretty 2.5D [3]) are used to provide better composition (IN layers 06 - 11 and M00 layers).
The model recommends using badv3 - a text embedding file of negative cue words. It not only simplifies the writing of prompt words, but also stimulates the potential of the model and improves the quality of generated images. Usually, the effect of badv3 is enough, and you don't need to fill in additional quality prompt words. But it doesn't solve 100% of the picture problems.
You should place the downloaded negative text embedding file, the badv3.pt file, in the embeddings folder of your stable diffusion directory. After that, you just need to enter badv3 in the negative prompt word field.
My idea is to train a concept of bad images and put it into negative prompt to avoid generating such bad images. I trained a negative text embedding, badv3, using a few hundred bad images generated by the model, which works in a similar way to EasyNegative [4]. I tried training it to overfit to mitigate the effect of traditional negative text embeddings on the style of the model, and it seemed to work. Badv3 works better for this model than EasyNegative. I haven't compared other negative text embeddings yet. badv3 is the nth negative text embedding I trained after deformityv6. It's pretty easy to make, but the results are pretty random. I have tried removing weights from another model trained with bad images by adding differencing, but so far with no promising results. My next plan is to train Negative Lora instead of Negative Text Embeddings to directly "remove" some of the weights from the model rather than "avoid" them.
This model is used to test multi-style model training, non-profit or commercial, all interest. If there is any infringement, it will be deleted immediately.
All cover images were generated by text2image without using any Lora, using the negative text embedding at [1] in the negative prompts.
Users are only authorized to use this model to generate pictures, and unauthorized reproduction is not allowed.
Any commercial use of this model is strictly prohibited!
The display picture in Appendix A is a large classification prompt word for the special label of this model, and it is for reference only.
Please do not use this model to generate bloody, violent, pornographic images and any infringing content! Therefore, only part of the trained keywords can be provided in Appendix A.
[2] Pretty 2.5D | Stable Diffusion Checkpoint | Civitai
[3] 多风格模型 - 赛璐璐风格科幻插画 - AI加速器社区 (acceleratori.com)
[4] EasyNegative | Stable Diffusion TextualInversion | Civitai
[5] vae-ft-mse-840000-ema-pruned.ckpt · stabilityai/sd-vae-ft-mse-original at main (huggingface.co)
截止至 AIDV2.5 / Until AIDV2.5:by 35s00, by agm, by ajimita, by akizero, by ask, by chicken utk, by demizu posuka, by dino, by fadingz, by fuzichico, by hamukukka, by hitomio16, by ichigo ame, by key999, by kooork55, by matcha, by mika pikazo, by modare, by myung yi, by naji yanagida, by nezukonezu32, by nico tine, by nikuzume, by ninev, by oda non, by palow, by qooo003, by rolua, by samip, by serie niai, by shirentutu, by sho, by silver, by sonomura00, by void, by wlop, by xilmo, by yoneyama mai, by yosk6000, by zumizumi
AIDV2.6 新增 / AIDV2.6 adds:by caaaarrot, by hinaki, by homutan, by kazari tayu, by kitada mo, by roitz, by teffish, by ukiatsuya, by yejji, by ziyun
AIDV2.7 新增 / AIDV2.7 adds:by poharo, by jnthed, by 7thknights, by some1else45, by yohan, by yomu, by tsvbvra
AIDV2.9 新增 / AIDV2.9 adds: by kkuni, by starshadowmagic, by star furu, by rella, by tukumi bis, by yumenouchi, by chon, by eku uekura, by tira27, by kuroume, by hachisan, by nounoknown, by kurige horse, by konya karasue, by noyu, by ame929, by muryou tada, by yun216, by nekojira, by nanmo, by wait ar, by akasaai, by momoco, by sushi0831, by taiki, by siki, by kinta, by hata, by anteiru, by lemoneco, by umaiyo puyoman, by freng, by rin7914, by shimanun, by hidulme, by whoisshe, by 5eyo, by cutesexyrobutts, by shiren, by omutatsu, by gesoking, by 3meiji, brushstrokes
AIDV2.9 更新 / AIDV2.9 Update: (i) by demizu posuka; (ii) by fuzichico -> by fuzichoco; (iii) 提高了训练图像的分辨率 / Increased resolution of training dataset; (iv) 在 skip clip = 1 上训练 / Trained on "skip clip = 1".
AIDV2.91 新增 / AIDV2.91 adds: impasto, pseudo-impasto, semi-realistic, concept art, flat color, celluloid
直到 AIDV2.10beta1 / Until AIDV2.10beta1: by 35s00, by 3meiji, by 5eyo, by 7nu, by 7thknights, by adenim, by agm, by ajimita, by akizero, by ame929, by anmi, by anteiru, by arutera, by ask, by atelier irrlicht, by bunbun, by caaaaarrot, by camu, by canking, by ccroquette, by chi4, by chicken utk, by chon, by cola, by cutesexyrobutts, by darumakarei, by dino, by dora, by dsmile9, by ei maestrl, by ekita kuro, by ekita xuan, by eku uekura, by fadingz, by fajyobore, by foomidori, by freng, by fuzichoco, by gesoking, by gomzi, by hachisan, by hakuhiru oeoe, by hamukukka, by haru, by hata, by hidulme, by hikinito0902, by hinaki, by hitoimim, by hitomio16, by hizumi, by homutan, by hotatenshi, by houk1se1, by hyatsu, by icecenya, by ichigo ame, by inoriac, by iromishiro, by iwzry, by jnthed, by joezunzun, by junsui0906, by karohroka, by kaya7hara, by kazari tayu, by killow, by kin, by kinta, by kishiyo, by kitada mo, by kkuni, by konya karasue, by kooork55, by kot rou020, by krenz, by kurige horse, by kuroume, by lalalalack, by lemoneco, by lm7, by lovelymelm, by lpmya, by mar takagi, by matcha, by matsukenmanga, by melowh, by menou, by midori xu, by mika pikazo, by misumigumi, by miv4t, by mochizukikei, by mogumo, by momoco, by momoku, by morikuraen, by mqkyrie, by muina, by munashichi, by muryou tada, by myaru, by myc0t0xin, by myung yi, by nack, by naji yanagida, by nanmo, by nardack, by narue, by nekojira, by netural, by nezukonezu32, by nico tine, by nikuzume, by nine, by nineo, by ninev, by niwa uxx, by nixeu, by noco, by noodle4cool, by nounoknown, by noyu, by oda non, by omutatsu, by onineko, by palow, by panp, by pikuson, by poharo, by poire, by potg, by pro-p, by qooo003, by rai hito, by rattan, by reiko, by rella, by rhtkd, by rin7914, by roitz, by ryuseilan, by saberiii, by sais, by sakiika, by samip, by sanosomeha, by say hana, by scottie0073, by senryoko, by serie niai, by seuhyo99, by shal-e, by shimanun, by shirabii, by shiraishi kanoya, by shiren, by shirentutu, by sho, by sia, by siki, by silver, by solipsist, by some1else45, by sonomura00, by sooon, by star furu, by starshadowmagic, by starzin07, by sui 0z0, by sul, by sushi0831, by suzukasuraimu, by taiki, by takumi bis, by teffish, by tidsean, by tira27, by tsukiho tsukioka, by tsvbvra, by ttosom, by tukumi bis, by uiiv, by ukiatsuya, by umaiyo puyoman, by void, by wait ar, by walzrj, by wanke, by whoisshe, by wlop, by xilmo, by yejji, by yogisya, by yohan, by yomu, by yoneyama mai, by yosk6000, by yumenouchi, by yun216, by yunikon147, by yunsang, by ziyun, by zumoti4