🖥️Welcome to try out the open-source GPT4V-Image-Captioner, developed by my friend and me. It offers a one-click installation and comes integrated with multiple features including image pre-compression, image tagging, and tag statistics. Recently, we also launched the webui plugin version of this tool, everyone is welcome to use it!
🌍欢迎加入QQ群"兔狲·AIGC梦工北厂",群号 :780132897 ;"兔狲·AIGC梦工南厂",群号 :835297318(入群答案:兔狲)。Telegram群聊“兔狲的SDXL百老汇”,链接:https://t.me/+KkflmfLTAdwzMzI1
One-sentence update summary: HelloWorld 7.0 is an iteratively optimized version, with the best body performance in the entire series, and further enhanced concept scope and detail richness.
Update details:
By adding negative training images, strengthening pose training, and optimizing the clip model, the accuracy of the model's limbs and hands has been improved compared to previous versions. The recommended negative prompt words are: "bad hand, bad anatomy, worst quality, ai generated images, low quality, average quality".
Extracted the fine-tuned LoRA from the official SPO model and incorporated it into HelloWorld 7.0. SPO is a further improvement of the DPO method. The SPO base model is used for better performance than the DPO XL base model and the original SDXL base model. The SPO LoRA can enhance image details & contrast and beautify images. Thanks to the technical team behind SPO.
Continued to expand the concept scope of the training set, but optimized and streamlined the training set (large training set fine-tuning is too expensive, and H800 is difficult to rent recently, can't afford the local training time). The current total training set is 20,821 images. The training set resolution distribution is as follows, and it is recommended to use several resolutions with a larger number of images for output:
(832, 1248) - Count: 7128
(896, 1152) - Count: 6250
(1248, 832) - Count: 2402
(1024, 1024) - Count: 1639
(1360, 768) - Count: 928
(1152, 896) - Count: 870
(768, 1360) - Count: 432
(960, 1088) - Count: 506
(992, 1056) - Count: 162
(1088, 960) - Count: 140
(704, 1472) - Count: 120
(1056, 992) - Count: 122
(1472, 704) - Count: 115
(1632, 640) - Count: 75
(640, 1632) - Count: 12
Used GPT4O to re-label all datasets. This time, a structured labeling method was used, with the specific structure being: "one-sentence summary description + multiple image element tags + inspired by XXX + aesthetic quality description words", where the aesthetic quality description words are divided into five levels: worst quality, low quality, average quality, best quality, and masterpiece. A typical labeling example is as follows:
conceptual art featuring a human hand wrapped in red and beige ribbons, isolated against a plain, light background, realistic style, minimalist color scheme, smooth textures, elongated and surreal aesthetic, inspired by salvador dalí's surrealist works, masterpiece
The "High-Frequency Tagging Word List" and the "High-Frequency Art Style List" involved in the Inspired by XXX for the HelloWorld 7.0 version will only be provided to commercial licensing users. Partners who have purchased Helloworld XL series model authorization in the past, please contact me if there are any omissions to get it for free.
Players can refer to the High-Frequency Tagging Word List of HelloWorld 6.0. In addition, I have also provided 150+ high-quality HelloWorld 7.0 example images in the gallery, which can be used as a reference for everyone's output. Model making is not easy, thank you players for your understanding and tolerance!
LEOSAM HelloWorld 6.0 Top 250 High-Frequency Tagging Word List
Thank you for your patience. I have been job hunting recently, which caused some delays in the HelloWorld updates. Here are the main updates in version 6.0:
HelloWorld 6.0 is an iterative improvement based on version 5.0. Based on my own testing, the realism effect is not significantly different from version 5.0. The main advantage of version 6.0 lies in its broader coverage of concepts in the training set. According to feedback, enhancements have been made in various themes including surrealism, boudoir, group photos, masks, origami, 3D renders, cars, dragons, and maternity photography. Some examples are provided in the illustrations.
HelloWorld 6.0 intentionally includes some low-quality images in the training to enhance the model's response to negative prompts. It is recommended to use the following terms in negative prompts: "low quality, jpeg artifacts, blurry, poorly drawn, ugly, worst quality".
The main body of the HelloWorld 6.0 training set employs GPT4v tagging. For images that GPT4v cannot tag, cogVQA guided by blip2-opt-6.7b is used for tagging. The tagging language style of these multimodal models differs significantly from the traditional WD1.4 tagger. To facilitate more accurate triggering of different concepts in the training set, I have compiled the top 250 high-frequency tagging words from the HelloWorld 6.0 training set. You can view these high-frequency words in this document.
Finally, although SD3 is about to be released, I will still update to HelloWorld XL 7.0, hoping to achieve greater enhancements in version 7.0!
This model is a run-accelerated version of the HelloWorld SDXL base model, incorporating both SDXL-Lightning technologies. Equipped with the Eular a sampler and CFG 1, it is capable of generating images in 6-8 steps, which is three times faster than the original SDXL version. Moreover, upon comparison, its imaging results are superior to those of LCM or Turbo versions.
The recommended parameters for generating images with this model are:
Sampler: Eular a (Important! The model is specifically adapted to Eular a, other samplers may not yield as good results)
CFG scale: 1
Sampling steps: 8 steps (6~8 steps are acceptable)
Hires algorithm: ESRGAN 4x / 8x_NMKD-Faces_160000_G
Hires Upscale factor: 1.5x
Hires steps: 8 steps
Hires Denoising strength: 0.3
HelloWorld 5.0 is the most substantial update in the history of the HelloWorld series, tagged with GPT-4v, and has undergone significant fine-tuning in fields such as science fiction, animals, architecture, and illustration.
Comparative tests show improvements in this version include:
1. More varied and dynamic character poses and image compositions, creating visually engaging pictures;
2. The film dataset has been extensively trained. While the film texture was weak from versions 2.0 to 4.0, many fans missed the leogirl style of version 1.0. Therefore, this update has specifically strengthened the film texture without compromising other photographic qualities. The film texture can be triggered by phrases such as film grain texture and analog photography aesthetic;
3. Enhanced expressiveness in themes like science fiction, thriller, and animals, with mechas and other subjects having a more designed feel. Animals like snow leopard, red panda, giant panda, tiger, the Pallas's cat, and domestic cats and dogs are more lifelike;
4. Thanks to GPT tagging, prompt adherence and conceptual accuracy have been further improved.
However, the drawbacks of this version include:
1. As this is a substantial fine-tuning update, the error rate for limbs and such may slightly increase, a normal phenomenon when moving out of a comfort zone into new areas of relative optimization. Previous versions underwent extensive limb testing for improvements, while the new version had limited time for such enhancements. Nevertheless, the accuracy of limbs in this version is at least higher than in version 1.0, and I will continue to make improvements in future updates.
2. Due to the reinforced film texture, even though GPT tagging is as accurate as possible, there can be an unavoidable default warm tone in images. However, you can use prompts like studio light or sharp focus to produce high-definition studio-quality images, and with proper use of prompts, the output can have better skin tones and visual appeal than previous versions.
3. This version includes more full-body character images to enhance the full-body effect, so the model may produce wider scenes than before if no specific character composition is directed. Currently, the facial details in 1024 resolution full-body shots might be less sharp compared to half-body or close-up shots. However, this can be improved by adetailer and a 1.5x Hires. fix at 0.3 intensity, or by using prompts like specifying composition to avoid generating full-body images.
4. Since a small number of high-quality illustration datasets have been added, there is a chance that prompts related to animated styles will produce animated images. If this concerns you, please adjust your prompts accordingly.
These are the main updates for this version. Training the SDXL base model is challenging, and when the training set approaches ten thousand images, the cost for tagging and training for each model exceeds 300 USD. I welcome everyone to use the model and appreciate any feedback you can provide! If you find this model satisfactory, I would be immensely grateful if you could help spread the word about it.
HelloWorld4.0 is a progressive transitional version from tagging with blip+clip to tagging with GPT4V. I initially trained a pure GPT4V tagging model, and then merged it with a large proportion of the HelloWorld3.2 version and 0.05 proportion of Juggernaut XL (to adjust the skin tone). The new version has shown improvements in prompt compliance and concept coverage compared to the 3.2 version.
The new GPT4V tagging training set has doubled from the 4000 images of the helloworld3 series to 8000 images, covering not only portraits but also animals, architecture, nature, food, illustrations, and more. However, the pure GPT4V version encountered an overfitting problem, which is preliminarily attributed to the doubling of the number of training images. One of the next steps in iterative optimization is to find out how to include as many non-portrait concepts as possible while ensuring sufficient training of portraits. At this stage, a fusion of the new and old versions has been used for fine-tuning to ensure a smooth transition between versions, so the expanded concept set and the advantages brought by GPT4V tagging are not very perceptible at the moment. These advantages will become increasingly apparent in the subsequent generations 5 and 6 of the model.
Version 3.2 is an iteration optimized with DPO technology, and compared to version 3.0, there are optimizations in skin tone and limb accuracy, but the improvements are not significant. That's why this version is marked as 3.2 rather than being labeled as 4.0.
The new version has expanded the training set, enhancing the model's ability to express in different artistic styles, including science fiction and art.
It has integrated a self-made quality enhancement LoCon (created using slider technology), to improve image texture and alleviate issues of distortion in fingers and limbs.
Thank you all for your patience. After overcoming various challenges, the HelloWorld 2.0 version is finally ready to be presented to you all in a state that I'm satisfied with. The main differences between HelloWorld 2.0 and 1.0 are as follows:
HelloWorld 2.0 no longer requires trigger words, and the results are comparable in quality to version 1.0 with trigger words.. The trigger word 'leogirl' in 1.0 was highly associated with East Asians. After the cancellation of the trigger words, while words like '1girl' will still likely generate East Asian portraits when race is not specified, you can now specify the race by using keywords like nationality, skin color, etc. For example, the trigger effects for words like 'Chinese', 'Russian', 'Iranian', 'Jamaican', 'Kenyan', 'dark-skinned', 'pale-skinned', etc., are listed below.
You can also get different styles of characters by writing the names of people from different countries and genders in the prompt, such as Han Meimei (China), Sophie Martin (France), Priya Patel (India), Fatima Al-Hassan (Arab), Wanjiru Mwangi (Kenya). The above prompts are just examples, there are many available prompts and ways to play, and you're welcome to explore and share them by yourself.
HelloWorld 2.0 has balanced the quality/color and offers more style options. The 1.0 version, when used with 'leogirl', would likely produce images with a strong film texture. HelloWorld 2.0 is no longer tied to a film texture and can be customized with some quality-related prompts. Some prompts that have been tested and work well include:
high-end fashion photoshoot, product introduction photo, popular Korean makeup, aegyo sal, Sharp High-Quality Photo, studio light, medium format photo, Mamiya photography, analog film, Medium Portrait with Soft Light, real-life image, refined editorial photograph, raw photo, real photo, Scanned Photo, film still
The color effects of these prompts are as follows:
The training set for HelloWorld 2.0 significantly increased the proportion of full-body photos to improve the effects of SDXL in generating full-body and distant view portraits. Although it has improved compared to version 1.0, it is still strongly recommended to use 'adetailer' in the process of generating full-body photos. Also, for users with enough video memory (24g), it is recommended to perform 1.5x high-resolution repair on the image, which can significantly improve facial details.
Special reminder: When using the HelloWorld 1.0 model, please remember to add the trigger word "leogirl".
Distinct from SD1.5 base model “MoonFilm”, “HelloWorld” is a brand new realistic SDXL base model series, . In order to allow more users to discover HelloWorld, I have retained the original Moonfilm's model link. It can be perceived as a spiritual continuation of Moonfilm on the SDXL new platform, but HelloWorld aims to achieve more than just the pursuit of realism and film-like quality in portraits. Thanks to the far superior amount of information and text understanding capabilities of SDXL compared to SD1.5, HelloWorld is a base model that seeks to realistically depict all things, or in other words, I hope to gradually build a virtual photography world using HelloWorld.
The realistic base model of SD1.5 has developed to a quite mature stage, and it is unlikely to have a significant performance improvement. Unless there is a breakthrough technology for SD1.5 platform, the Moonfilm & MoonMix series will basically stop updating. I will devote my main energy to the development of the HelloWorld SDXL large model. The 1.0 version is now available for download, and the 2.0 version is being developed urgently and is expected to be updated in early September.
As a brand new SDXL model, there are three differences between HelloWorld and traditional SD1.5 models:
Unlike SD1.5 base models, which typically do not include trigger words, please remember to use the trigger word "leogirl" when using HelloWorld 1.0. This ensures that the SDXL model triggers the training set effect more stably.
The HelloWorld model supports direct output at a resolution of 1024*1024 pixels, eliminating the need for high-resolution magnification. The quality of close-up portrait directly output is not inferior to the SD1.5 version, but there are still flaws when outputting distant portraits directly. Therefore, it is suggested to use ADetailer plugin, which can effectively correct the problems of distant faces.
SDXL now allows for easier output using simple natural language prompts. It is recommended to try more natural language prompts, which will result in better outcomes when outputting AI realistic photos.
After multiple rounds of testing, the suggested drawing parameter settings are:
Steps ≥ 25
Sampler: DPM++ 2M Karras
CFG scale: 10
Size ≥ 1024x1024
ADetailer: open
Everyone is welcome to try HelloWorld and provide plenty of feedback. Your valuable opinions are very important for the next step of model improvement!
The HelloWorld series of models (hereinafter "the Model") has been crafted by myself (hereinafter "the Owner") with the assistance of the LiblibAI platform. Republishing the Model on platforms excluding LiblibAI and Civitai is unauthorized by the Owner.
The Owner permits the use of images generated by the Model for non-commercial educational or informative purposes at no cost, on the condition that:
- Users adhere to applicable laws and do not violate the rights of the Model or any third-party.
- Attribution for the images must be clearly stated as "created by LEOSAM's HelloWorld base model".
For any form of commercial utilization, a prior commercial license agreement with the Owner is required. For inquiries related to commercial licensing and model personalization, please reach out to the Owner via the contact information available on the Owner's homepage.
The development and free distribution of the SDXL model represent significant endeavors. The Owner pledges ongoing complimentary updates to the HelloWorld model for individual enthusiasts as a token of appreciation for the community's contributions to open-source development. Collaborative commercial engagements are vital for the Model's advancement and refinement. The Owner appreciates every user for their understanding and support.
Unauthorized use may breach applicable laws and carry legal repercussions. The Owner retains exclusive rights to interpret this statement, which is governed by prevailing laws and regulations.