The Comparison of the Effectiveness and Efficiency of Fine-Tuning Models on Stable Diffusion in Creating Concept Art

Abdul Bilal Qowy; Ahmad Nur Ihsan; Sri Hartati

doi:10.15408/jti.v17i1.37942

Authors

Abdul Bilal Qowy Computer Science, UAG University, Indonesia
Ahmad Nur Ihsan Computer Science, UAG University, Indonesia
Sri Hartati Computer Science, UAG University, Indonesia

DOI:

https://doi.org/10.15408/jti.v17i1.37942

Abstract

This research aims to overcome the limitations of the Stable Diffusion model in creating conceptual works of art, focusing on problem identification, research objectives, methodology and research results. Even though Stable Diffusion has been recognized as the best model, especially in the context of creating conceptual artwork, there is still a need to simplify the process of creating concept art and find the most suitable generative model. This research used three methods: Latent Diffusion Model, Dreambooth: fine-tuning Model, and Stable Diffusion. The research results show that the Dreambooth model produces a more real and realistic painting style, while Textual Inversion tends towards a fantasy and cartoonist style. Although the effectiveness of both is relatively high, with minimal differences, the Dreambooth model is proven to be more effective based on the consistency of FID, PSNR, and visual perception scores. The Dreambooth model is more efficient in training time, even though it requires more memory, while the inference time for both is relatively similar. This research makes a significant contribution to the development of artificial intelligence in the creative industries, opens up opportunities to improve the use of generative models in creating conceptual works of art, and can potentially drive positive change in the use of artificial intelligence in the creative industries more broadly.

References

S. S. Baraheem, T. N. Le, and T. V. Nguyen, “Image synthesis: a review of methods, datasets, evaluation metrics, and future outlook,” Artif Intell Rev, vol. 56, no. 10, pp. 10813–10865, Oct. 2023, doi: 10.1007/s10462-023-10434-2.

H. Cao et al., “A Survey on Generative Diffusion Model,” Sep. 2022, [Online]. Available: http://arxiv.org/abs/2209.02646

C. Liu et al., “Generative Diffusion Models on Graphs: Methods and Applications,” Feb. 2023, [Online]. Available: http://arxiv.org/abs/2302.02591

S. Shahriar, “GAN Computers Generate Arts? A Survey on Visual Arts, Music, and Literary Text Generation using Generative Adversarial Network.”

P. Dhariwal, ⇤ Openai, and A. Nichol, “Diffusion Models Beat GANs on Image Synthesis.”

C. Zhang, C. Zhang, M. Zhang, and I. S. Kweon, “Text-to-image Diffusion Models in Generative AI: A Survey,” Mar. 2023, [Online]. Available: http://arxiv.org/abs/2303.07909

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-Resolution Image Synthesis with Latent Diffusion Models.” [Online]. Available: https://github.com/CompVis/latent-diffusion

J. Ho, C. Saharia, W. Chan, D. J. Fleet, M. Norouzi, and T. Salimans, “Cascaded Diffusion Models for High Fidelity Image Generation Figure 1: A cascaded diffusion model comprising a base model and two super-resolution models. *. Equal contribution,” 2022.

“How to Fine-tune Stable Diffusion using Dreambooth,” https://towardsdatascience.com/how-to-fine-tune-stable-diffusion-using-dreambooth-dfa6694524ae. Accessed: May 10, 2023. [Online]. Available: https://towardsdatascience.com/how-to-fine-tune-stable-diffusion-using-dreambooth-dfa6694524ae

N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman, “DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation.” [Online]. Available: https://dreambooth.github.io/

R. Gal et al., “An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion,” Aug. 2022, [Online]. Available: http://arxiv.org/abs/2208.01618

Lilian Weng, “What are Diffusion Models?” Accessed: May 10, 2023. [Online]. Available: https://lilianweng.github.io/posts/2021-07-11-diffusion-models/

J. Ho, A. Jain, and P. Abbeel, “Denoising Diffusion Probabilistic Models.” [Online]. Available: https://github.com/hojonathanho/diffusion.

“Stability AI Stable Diffusion Public Release.” Accessed: May 10, 2023. [Online]. Available: https://stability.ai/news/stable-diffusion-public-release

“Stable Diffusion WebUI AUTOMATIC1111: A Beginner’s Guide.” Accessed: May 10, 2024. [Online]. Available: https://stable-diffusion-art.com/automatic1111/

E. Hoogeboom, E. Agustsson, F. Mentzer, L. Versari, G. Toderici, and L. Theis, “High-Fidelity Image Compression with Score-based Generative Models,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.18231

“How to Implement the Frechet Inception Distance (FID) for Evaluating GANs,” Oct. 2019, Accessed: May 10, 2023. [Online]. Available: https://machinelearningmastery.com/how-to-implement-the-frechet-inception-distance-fid-from-scratch/

A. Mcnamara, “Visual Perception in Realistic Image Synthesis,” 2001.

“Python | Peak Signal-to-Noise Ratio (PSNR).” Accessed: May 10, 2023. [Online]. Available: https://www.geeksforgeeks.org/python-peak-signal-to-noise-ratio-psnr/

F. A. Fardo, V. H. Conforto, F. C. De Oliveira, and P. S. Rodrigues, “A Formal Evaluation of PSNR as Quality Measurement Parameter for Image Segmentation Algorithms.”