GENERATIVE ADVERSARIAL NETWORKS.
Personally; I’m particularly fond of GAN’s. These types of networks produce a seemingly endless variety of solutions in the images generated from them. I’ve been working with a VQGAN + CLIP model (Wombo Dream) for about eight months and in that time have generated something like 30,000 images. This staggering amount of processing has been facilitated by owning the software I use; thereby, not being tied to a subscription service. Having said that the learning has been exceptional.
Like ‘Diffusion Models’ ‘Generative Adversarial Networks’ are trained on datasets. There is little or no point for me to explain how this works. Read the webpage that the link at the top of this page points to. What this publication clearly reveals is that GAN’s are exceptionally flexible and useful.
Why GAN’s?
‘A content to picture production approach seeks to produce photorealistic images that are semantically coherent with the provided descriptions from text descriptions. Applications for creating photorealistic visuals from text includes photo editing and more. Strong neural network topologies, such as GANs (Generative Adversarial Networks) have been shown to produce effective outcomes in recent years. Two very significant factors, visual reality and content consistency, must be taken into consideration when creating images from text descriptions. Recent substantial advancements in GAN have made it possible to produce images with a high level of visual realism. However, generating images from text ensuring high content consistency between the text and the generated image is still ambitious. To address the above two issues, a Bridge GAN model is proposed, where the bridge is a transitional space containing meaningful representations of the given text description. The proposed systems incorporate Bridge GAN and char CNN – RNN model to generate the image in high content consistency and the results shows that the proposed system outperformed the existing systems’. 1.
What’s available that employs GAN’s?
“DALL·E is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions, using a dataset of text–image pairs. We’ve found that it has a diverse set of capabilities, including creating anthropomorphized versions of animals and objects, combining unrelated concepts in plausible ways, rendering text, and applying transformations to existing images”. @ https://openai.com/research/dall-e.
With this type of neural network most software is subscription only, which is OK for student private use but makes it difficult for connected staff to engage in any meaningful support.
- Text to Image Synthesis Using Bridge Generative Adversarial Network and Char CNN Model
BT – Natural Language Processing and Information Systems: Springer; Nature, Switzerland.
Other developments
- SEMANTIC SPATIAL AWARE GAN
- Multi-generator Text Conditioned Generative Adversarial Networks.
- (2 =important reading)