Like many others, I have been playing around with DiffusionBee, the application / wrapper that lets you run Stable Diffusion on your Mac computer. This tools allows you to give a textual prompt and you can ask it to produce some image. You can also provide an input image, which will also be used as a base for the output images. There are also tools to fill in parts of a picture, or out-paint, i.e. paint the stuff outside of the frame.
The app is pretty intuitive, some tasks, like changing the image-model, are more involved, but this is not exactly the first thing a user would want to do. Depending on the settings, it takes around a minute to generate a picture out of thin air. Selecting the right prompt, i.e. the keywords to get the picture, takes some using too.
For me, the most impressive aspect is that you can actually run such a complex tool on a laptop. I remember running neural networks written in CLOS on Sun-Stations at the university of Geneva, that was not fast.
This was also the first time I noticed a dramatic difference between a ARM based, M1 Mac, compared to an Intel one. The program just runs on the M1, barely causing any CPU load, as all the processing is offloaded to the Neural Engine. The same process just brings an Intel machine to a standstill. This was more dramatic than the switch from the 68K processors to PowerPC, when suddenly the Mandelbrot set could be computed in a few seconds.
Choosing the right prompts takes some trial and error, and you get the best results when you don’t stray too far from the clichés: ask for fantasy female characters in armour, and you get pretty nice images. The pictures a far from perfect, the eyes are slightly off and the hands tend to be weird, but this is still much better than what many people, myself included can draw. It is also interesting that the system fares way better with filled in pictures than with line art.
Remember this is a pretty simple image model running on laptop from 2021. The images produced by Midjourney version are way much better. There is a vigorous debate about the ethics underlying the training data, but Adobe already announced that their system, named Firely is trained on Adobe owned images or public domain ones.
I suspect that to build a really good model, a well labelled training, with people of different shapes ethnicities and ages, and a large corpus of animals, plants and objects, more than the works of many artists. Because for the moment, stable diffusion cannot draw a steam locomotive…