If you’ve spent any time at all on social media recently, the Pope in a white puffer jacket or “Harry Potter by Balenciaga” (and its several spin-offs) may have caught your eye. Of course, AI-generated art and video are everywhere these days. To those of us who have only recently woken up to the AI revolution, the technology feels very recent. Arguably the most popular AI image generator, Midjourney, is not even a year old.
Yet, computer-generated art has been slowly advancing for more than half a century and very quickly for the last decade or so. The evolution of neural networks, a crucial component of modern AI-generated imagery, started even longer ago – all the way back in the 1940s.
Algorithm art: The early days
In some sense, the birth of AI-generated imaging can be traced back to the 1960s. This is when researchers started exploring the use of computer algorithms to create digital images. One of the earliest examples of this is the work of A. Michael Noll, who, in 1962, used a digital computer to generate 3D shapes.
In the years that followed, researchers continued developing rule-based algorithms to create digital images. One example is Benoit Mandelbrot’s use of fractals. These are complex geometric patterns that can be repeated infinitely at smaller and smaller scales. In the 1980s, Mandelbrot pioneered the use of computer graphics in creating and displaying fractal geometric images.
At about the same time, the researchers Stephen Wolfram and John Conway used cellular automata to create a variety of interesting and complex patterns, including the famous Conway's Game of Life. Cellular automata are a type of mathematical system that can create complex patterns through simple rules. Imagine a grid made up of small squares or cells, where each cell can be either "on" or "off". The state of each cell is updated based on the states of its neighboring cells, according to a set of rules. These rules determine whether a cell should turn on or off in the next iteration of the grid.
Overall, the early days of AI-generated images were characterized by the use of simple rule-based algorithms to create relatively low-resolution and abstract images. However, these early experiments laid the groundwork for future developments in the field.
In the 1990s, the advancement of digital processing power and the availability of specialized software allowed researchers and artists to create more complex and sophisticated computer-generated images.
The dawn of neural networks
The introduction of Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) in the early 2010s represented a dramatic breakthrough in the field. Before, image generation was limited by rule-based algorithms that could only generate simple, low-resolution images. Now, it became possible to create complex, realistic pictures. This sparked a new wave of research and innovation.
The difference was that GANs and VAEs are neural networks.
What are neural networks?
A neural network is a type of machine-learning algorithm that is inspired by the human brain. Just like brains have many interconnected neurons, a neural network has interconnected nodes. Each node takes in some input, performs some math on it, and then sends the result to other nodes. By combining the results of many nodes in many layers, a neural network is able to do things like recognize objects.
Imagine we want to teach a neural network how to recognize cats in pictures. We would train the network by showing it many pictures of cats and telling it which ones are cats and which ones aren't. Each picture would be turned into a bunch of numbers that represent the colors of the pixels in the image. The network would then do some math with those numbers to figure out what features make a cat a cat, like pointy ears or a fuzzy tail.
Once the network has learned what features make a cat a cat, you can show it new pictures that it's never seen before and ask it if there's a cat in the picture. It will take in the numbers that represent the new picture, do the same math it did with the cat pictures it learned from, and give you an answer: either "Yes, there's a cat in this picture" or "No, there's no cat in this picture."
In other words, unlike rule-based algorithms, which rely on explicit rules or logic to make decisions, neural networks rely on pattern recognition and statistical learning.
The evolution
The evolution of neural networks can be traced back to the 1940s. This is when Warren McCulloch and Walter Pitts proposed a mathematical model of artificial neurons, which formed the foundation of modern neural networks. Since then, neural networks have undergone several phases of evolution, each driven by advancements in computing hardware, software, and algorithms.
The first phase lasted from the 1940s to the early 1960s. It was characterized by the development of basic neural network models, such as the perceptron, which was capable of learning linearly separable patterns.
The second phase began in the late 1960s and lasted until the mid-1980s. During this time, researchers developed more advanced neural network models, such as the backpropagation algorithm, which enabled neural networks to learn non-linear patterns. However, this phase ended with the advent of machine learning techniques based on decision trees and support vector machines, which outperformed neural networks in many practical applications.
The third phase of neural network evolution began in the late 2000s and is ongoing. This phase is characterized by the development of deep learning techniques, which use neural networks with multiple layers to learn hierarchical representations of data. The success of deep learning has been fueled by advancements in computing hardware, which enable efficient training of large-scale neural networks.
Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs) were introduced by Ian Goodfellow in 2014. A GAN is a specific type of neural network. Like any neural network, it learns to generate new data based on patterns and relationships in its training data.
But specifically, a GAN is made up of two parts: a generator and a discriminator. The generator creates new images, and the discriminator tries to figure out whether they're real or AI-generated.
The generator starts by taking a small set of random numbers and using them to create an image. It starts out making simple images, but as it gets better, it can make more complex and realistic ones. The discriminator's job is to look at the images and decide if they're real or fake. Eventually, the generator gets so good that the discriminator can't tell the difference.
That's how a GAN can create new, realistic images.
Some early examples of images created this way can be seen in this paper.
In 2018, the software company Nvidia introduced StyleGAN – a GAN with an extra layer. In 2019, researchers from Google introduced another variation, called BigGAN. BigGAN incorporates a truncation trick, which helps to control the degree of variation in the generated images.
Other variants of GANs, such as CycleGAN, Progressive GAN, and VQGAN, have emerged in recent years and continue to advance the field of AI-generated imagery.
Variational Autoencoders (VAEs)
In 2013, Diederik Kingma and Max Welling introduced the concept of Variational Autoencoders (VAEs).
A VAE is another type of neural network that can create new images. But VAEs work a little differently than GANs.
With VAEs, the network looks at a bunch of pictures and tries to figure out what features they have in common. It then creates a summary that represents those common features.
After the network has created this summary, it can be used to generate new images. You start with a code that represents a random combination of the common features the network found, and then you feed that code into the network. The network then turns that code into a new image that looks similar to the original images, but not exactly the same.
DALL-E and Stable Diffusion and Midjourney, oh my
Several applications surfaced in the last couple of years that allow users to generate images from textual descriptions. This is how the images for “Harry Potter by Balenciaga” and the “Pope in a puffer jacket” image were created. Arguably the most famous of these AI art generator models are DALL-E, Stable Diffusion, and Midjourney. (There are several others, including the free tool Artbreeder and Google’s Deep Dream.) The release of Midjourney in 2022 catapulted the technology into the mainstream consciousness.
OpenAI’s DALL-E was introduced in 2021. DALL-E's image generation process is based on a combination of techniques from generative models, such as GANs and VAEs, and computer vision methods. This combination of techniques allows DALL-E to generate high-quality images that closely match the text description while also exhibiting some creativity. DALL-E uses a transformer-based neural network architecture to generate images from textual input.
In contrast, Stable Diffusion, also by OpenAI, which was introduced in 2022, uses a diffusion-based approach.
The difference between transformer-based and diffusion-based models
Transformer-based models are a type of generative model that analyze patterns in a dataset of pictures to create new pictures.
They work by looking at the whole picture at once and creating new pictures by making changes to the patterns they've learned. They break down the picture into smaller parts and then put them back together in a new way. Think of transformer-based models like a painter who can create new pictures by combining elements of other paintings they've seen.
Diffusion-based models, on the other hand, are a newer type of generative model that relies on the diffusion process to generate images. They work by gradually adding small random changes to a picture, like little dots or lines, to build up the image until it's complete. Think of diffusion-based models like a sculptor who creates a new sculpture by gradually adding and shaping small pieces until it's complete.
Midjourney, another image generator, was created by the research lab with the same name. It was released in 2022. Midjourney has been tight-lipped about its model. But according to moneycontrol.com, it is believed that Midjourney employs a similar technology to Stable Diffusion, in other words, a diffusion-based model.
Following in the footsteps of GPT-4’s image-to-text feature, Midjourney has just last week announced its /describe feature. Users can now use this feature to transform images into words.
Turning AI-generated art into video
AI-generated art can be turned into video by utilizing a technique called "deepfakes." Deepfakes involve training a deep learning algorithm on a large dataset of images and videos of a particular person or object, and then using that algorithm to generate new images or videos that are highly realistic and believable.
To create a deepfake video using AI-generated art, the first step is to generate a set of images that represent different frames of the video. This can be done using various techniques such as GANs or neural style transfer. These images are then fed into a deep learning model that is trained to predict the next frame of the video based on the previous frames.
Once the deep learning model has been trained, it can be used to generate a complete video by predicting the next frame in the sequence based on the previous frames. The resulting video will be a highly realistic and believable deepfake that can be used for a variety of purposes.
A couple of years ago, a series of deepfake videos surfaced featuring a highly realistic computer-generated version of the actor Tom Cruise. The videos were created by a TikTok user named "deeptomcruise" who used AI-powered deepfake technology to superimpose the actor's face onto the body of an impersonator.
The deepfake videos gained widespread attention on social media due to their high level of realism and attention to detail, including subtle facial expressions and voice inflections. Some viewers were impressed by the technical prowess of the deepfake technology, while others raised concerns about the potential for malicious use and the impact on privacy and trust in media.
The creator of the deepfake Tom Cruise videos has stated that his intention was not to deceive or manipulate people, but rather to raise awareness of the power of deepfake technology and the need for greater scrutiny and skepticism when it comes to online content. Despite this, the videos have sparked a wider debate about the ethical implications of deepfakes and the need for regulation and safeguards to prevent their misuse.
The future of AI-generated art
It is inevitable that AI-generated imagery will keep advancing and will be increasingly adopted across a range of industries.
As AI-generated imagery tools become more sophisticated and user-friendly, they will become increasingly important for artists and designers. By automating certain aspects of the creative process, these tools can help creatives work more efficiently and free up time.
AI-generated art is already enabling artists and designers to create new forms of expression and explore new creative possibilities. As these tools continue to evolve, we may see even more innovative and boundary-pushing designs.
One of the most exciting possibilities of AI-generated imagery is the potential to blur the line between human and machine creativity. By combining the unique insights and perspectives of human artists and designers with the computational power of AI, we may be able to see truly groundbreaking works of art and design.
Ethical and societal implications of AI-generated art
As AI-generated imagery becomes more prevalent, ethical and societal implications are being discussed. For example, questions have surfaced around ownership and attribution of AI-generated works, as well as concerns around bias and fairness. The potential of deepfakes to mislead is another concern.
AI-generated imagery also has the potential to disrupt certain industries, including those that rely heavily on manual labor in image creation and editing, such as graphic design and photography. Many jobs may be at risk of automation. However, new job opportunities may also arise as the field of AI-generated imagery continues to grow and evolve. The impact on employment will depend on how companies and individuals choose to adopt and integrate these technologies into their workflow.
Overall, the future of AI-generated art is likely to be marked by both exciting new creative possibilities and important ethical and societal considerations. It will be up to artists, designers, and society at large to navigate these developments in a responsible and thoughtful manner.
In short
The history of AI-generated imaging can be traced back to the 1960s, when researchers first started exploring the use of computer algorithms to create digital images. Rule-based algorithms were initially used to create relatively low-resolution and abstract images. However, the advent of GANs and VAEs in the early 2010s represented a dramatic breakthrough, enabling the creation of complex, realistic pictures. This sparked a new wave of research and innovation. Neural networks, which are the foundation of GANs and VAEs, have undergone several phases of evolution since the 1940s, each driven by advancements in computing hardware, software, and algorithms. As we move forward, it will be interesting to see how AI-generated art continues to evolve and what new breakthroughs will be made.









Comments
Post a Comment