Diffusion Model

Diffusion Neural Networks are trained by inputing data and progressively destroying that data by the addition of noise which scrambles the information in the dataset and then predicting how to reassemble the data. As i understand it, when a user inputs a text prompt, noise data is generated and the neural network ‘works out’ how to unscramble the noise to ‘best fit’ its interpretation (predictive) of the text prompt. Adobe Firefly, Stable Diffusion, Perchance, Mid Journey, Snap Fusion, Imagen, etc are all examples of Diffusion Networks based on Open Ai’s Diffusion Model.

Fig 1. A simple representation of the Diffusion training process.


Fig 2. A detailed representation of the Diffusion training process.


Diffusion networks are trained on datasets from images whose resolution has been configured to 512 x 512 pixels. That’s a sort of OK resolution which comes in at 262,144 pixels. For each pixel in the dataset there are 4 bits of information 1. Pixel coordinate, 2. Colour 3. Luminance and 4. Saturation ( 1,048,576 bit of information).  Once a new dataset has been generated the relevant software is able to regenerate the image for viewing. No-ones artwork as such, is used or referenced.

What happens when you make an image?

The software you are using starts with this. A scrambled dataset which

would appear as noise if it were to be rendered.

Inputting a prompt activates a pipeline that decodes the text, downstreams it to the neural network where the ‘trained’ network ‘interprets’ the text and generates a dataset that is possibly a good fit to be able to regenerate an image that addresses the text prompt. In this process anything can happen and often does and every now and then something good happens that may or may not require substantial post production and / or re-rendering.

The above is a ‘visual representation’ of the breakdown of a dataset for images that are used to train a Diffusion Neural Network. Billions of these datasets are used.

What does this mean for you as a teacher? Why engage with this? 

At the outset, generating good quality images is costly and generating exhibition quality images could prove to be prohibitive for students. The best approach is possibly to treat this as an experiential practice to teach processes and problem solving. Structuring complex prompts, negative prompts, outpainting etc would certainly be beyond the scope of most unless they are personally well versed with generative AI as an art practice.

Using Mid Journey in a school environment is out of the question because its on a Discord server and access would be blocked by the DoE. However that does not stop individual students who have the capability from using it themselves. At HSC level there are currently no exemptions that prohibit the use of AI for image generation and it’s worth noting that at the recent Ballarat International Foto Biennale there was for the first time a category for AI generated images (Promptography). Most people don’t realise how painstakingly difficult it is to control anatomical accuracy, colour, composition, lighting, background detail etc.

Free online txt to image generating sotfware are reasonably plentiful. Whether you can access these behind the DoE network is another matter.

Adobe Firefly is one of the few, if not the only one, that has contractual approval, however what remains to be seen is whether the subscription cost that applies for AI rendering service is going to be passed on under the contractual costs to the DoE.