A High Level Introduction to Generative Adversarial Networks

Kaitlyn Zeichick
9 min readMar 21, 2021

Introduction

“Generative Adversarial Networks.” It’s a mouthful, and just the name itself makes it sound complicated. Generative Adversarial Networks, also known as GANs, are the magic behind fake images. Even if you’ve never heard of them, you’ve probably seen them in action. Just a few months ago, in November 2020, New York Times posted an article called “Designed to Deceive: Do These Fake People Look Real to You?” The Times used GAN software to create fake images of people, like the people shown below.

Source: https://www.nytimes.com/interactive/2020/11/21/science/artificial-intelligence-fake-people-faces.html

The goal of the Times was to show how much facial fakery has improved over the years, to discuss bias in the models that make them, and to expose some of the flaws in GANs. But they never really delved into what GANs are and how they work.

This article aims to introduce the concept of GANs at a very high level. I won’t be stepping through any code or going into the math behind them, but hopefully by the end of this article you’ll have a good understanding of what GANs are, a couple common problems with them, some different types that exist, and some strange ways they’ve been used.

So what are Generative Adversarial Networks, and how do they work?

A GAN is a machine learning framework used to create fake images that look so realistic that human beings can’t figure out if the image is fake or not. Depending on what images are used to train the model, a GAN can be used to make all kinds of fake images, such as images of people, dogs, or chairs that don’t exist but look like they could.

The reason that GANs are so effective is because they’re made up of two neural networks. One of the neural networks is called the “Generator,” and its job is to make fake images. The other neural network is called the “Discriminator,” and its job is to classify an image as real or fake. The Discriminator takes in fake images created by the Generator and real images that were pre-classified as real, and then it classifies each image. Once it’s done, the Generator gets some feedback as to how the Discriminator knew that its fake images were fake, and the Discriminator gets feedback on how well it did. Both of them then update to become better at their respective jobs.

In a way, the Discriminator and the Generator are fighting against each other. The Discriminator is getting better and better at figuring out which of the images were fake and made by the Generator, like a teacher getting better at figuring out what essays were plagiarized. But each time the student gets caught at plagiarizing, he gets better at plagiarizing his essays. In a way, the two networks are adversaries, which is why the model is called a Generative Adversarial Network.

Now let’s get a little bit more into the nitty-gritty of how this is working. The Generator and the Discriminator aren’t being trained at exactly the same time. Instead, the GAN is flip-flopping between training the Generator and training the Discriminator.

To train the Generator, the Generator first takes in random noise as input. Its initial outputs are terrible, obviously fake images. The Discriminator then takes in the fake, generated images alongside real images, and it classifies them as real or fake. Then it backpropagates (a way of moving backward through the model while updating it) through the Discriminator and Generator to obtain gradients. Gradients are numbers that tell us how we need to update a neural network’s weights to reduce its error, so that it’ll perform better on its next go-round. The gradients are then used to change the Generator weights.

Training the Generator

To train the Discriminator, the Discriminator first classifies the real and fake images. Then the Discriminator Loss penalizes the Discriminator when it classifies an image incorrectly and it backpropogates from the Discriminator Loss through the Discriminator to update its weights.

Training the Discriminator

Once the Discriminator starts being fooled by the Generator about half the time, the training stops. This stopping point is really important because if the GAN continues to train, the Discriminator will start giving random feedback which the Generator will be trained on. This then compromises the quality of the images it creates. Once we’ve stopped, we can throw away the Discriminator and keep our trained Generator to make realistic synthetic images for us.

GAN Problems

There are a couple interesting problems that GANs can run into. The first is called the Vanishing Gradient Problem, and the second is called Mode Collapse.

In the Vanishing Gradient Problem, the Discriminator is too good. It isn’t fooled by the Generator, and the resulting gradients are so small that it doesn’t give the Generator enough information to improve.

The Vanishing Gradient Problem: When the Discriminator is too good

In Mode Collapse, the Generator starts producing the same image over and over again because it’s an image that has tricked the Discriminator in the past. In this case, the Discriminator should learn to reject that repeated image, but sometimes it can get stuck in a local minimum and can’t figure out that strategy. So the generated images end up all being the same image or a small set of images.

Mode Collapse: All of the fake images are the same

Variations of GANs

There are quite a few variations of GANs, but I’ll only be going over two of the ones that I think are particularly interesting: CycleGANs and Stacked GANs.

CycleGANs are a variation of GANs that can do image to image translations. So you can do things like take an image of a horse and turn it into a zebra (and vice versa), or you can take an image of a winter landscape and turn it into a summer landscape.

Source: https://towardsdatascience.com/image-to-image-translation-69c10c18f6ff

CycleGANs are really cool because instead of using one GAN, they use two. The first one takes in images from some collection, like images of zebras, and its Generator outputs fake images from some other collection, like images of fake horses. The Discriminator tries to distinguish between the fake horses and pictures of actual horses.

The second GAN takes in pictures of horses and outputs fake zebras, and its Discriminator tries to distinguish between the fake zebras and the actual zebras.

A CycleGAN

The way CycleGANs become so good is that the creators of this model added in something called ‘Cycle Consistency Loss.’ What this means is that the first Generator will take in a picture of a zebra and generate a picture of a horse, but then as an extra step it’ll send that generated photo of a horse to be the input for Generator 2. Generator 2 then makes a picture of a zebra based on the picture of the fake horse. The input from Generator 1 and Generator 2 are then compared, and ideally they’re the same picture. The same is also done in reverse, where Generator 2 makes an image that is then used as the input for Generator 1.

Cycle Consistency Loss

Another really cool thing you can do with GANs is text to image synthesis. So you can type in something like “This bird is red and brown in color, with a stubby beak,” and the GAN will generate an image that looks like that description.

Text to Image Synthesis using StackedGANs. Source: https://paperswithcode.com/task/text-to-image-generation

They do this by using what’s called Stacked Generative Adversarial Networks. So instead of going straight from text to image with a single GAN, they broke up the problem so that they’d be using two GANs. The first one is called the Stage-I GAN, and as you can see in the picture above, this GAN produces sketches of the shape and basic colors of the object that was described in the text. The result is a low-resolution image. That image is then fed into the Stage-II GAN, which fills in the details of the object and produces a high-resolution photo-realistic image.

Strange GAN Applications

The first strange GAN application started with a variation of GAN called BigGAN which was made by Google. BigGAN generates its best images for each of the 1,000 different categories in the standard ImageNet dataset. The main difference between GAN and BigGAN is that since Google made it they were able to throw a bunch of computational power at the problem. Some people have estimated that today it would take around $60k in cloud computing time to train your own BigGAN model.

There aren’t too many practical applications of BigGAN. NVIDIA showed that BigGAN can make pixelated images clearer, and some people think it could be useful for making art. It was primarily made for research purposes, but one thing that came out of it that I think is really funny is a website called GANbreeder. GANBreeder took advantage of the fact that each category from BigGAN is a point in vector space to find in-between categories. So the app lets you pick two categories, like flower and dog, and average them to get categories that don’t exist in ImageNet, like dog-flowers.

An image using BigGAN. Source: https://aiweirdness.com/post/182322518157/welcome-to-latent-space

The second strange use of a GAN was by a group of researchers in Paris that called themselves Obvious. In 2018 they created a series of paintings that they called La Famille de Belamy. They called it Belamy because bel ami is French for “good friend”, similar to Goodfellow, and Ian Goodfellow was the inventor of GANs. One of those paintings was called Edmond de Belamy, which is the picture below, and it sold at an auction of $432,500. This led to research that tried to answer the still unresolved question of who deserves the credit for having created the painting: the person who made the model, the person who made use of the model, the artists who created the paintings that were used as inputs for the model, or the model itself?

Edmond De Belamy. Source: https://en.wikipedia.org/wiki/Edmond_de_Belamy

There’s a bunch of other strange use cases out there that I won’t go into but that I’d encourage anyone to go find. For example, some researchers used a GAN variation to make amateur dancers look like they were dancing like professional dancers. I like to think this was done by a data scientist with terrible dancing skills and a desire to fake his dance abilities on social media.

GAN Sources I Found Useful

A gentle introduction to CycleGANs: https://machinelearningmastery.com/what-is-cyclegan/

A blog about BigGAN and GANBreeder: https://aiweirdness.com/post/182322518157/welcome-to-latent-space

This GAN powered website for combining images (previously GANBreeder): https://www.artbreeder.com/

Google’s introduction to GANs: https://developers.google.com/machine-learning/gan

A blog that used a GAN to make fake Nike shoes, and in their intro explained GANs and how they work: https://bolster.ai/blog/gans-in-real-world-can-bad-actors-use-gans-to-beat-ai/

This longer paper about how to prevent the Discriminator from getting too good too quickly: https://arxiv.org/pdf/1710.10196.pdf

This article about using GANs to create synthetic medical images: https://towardsdatascience.com/harnessing-infinitely-creative-machine-imagination-6801a9fb4ca9

An article that shows a lot of creative ways GANs have been used: https://freshlybuilt.com/gans/

A paper with an explanation of StackGANs: https://openaccess.thecvf.com/content_ICCV_2017/papers/Zhang_StackGAN_Text_to_ICCV_2017_paper.pdf

A great tutorial for using a GAN to make fake images of numbers using the MNIST dataset: https://github.com/uclaacmai/Generative-Adversarial-Network-Tutorial/blob/master/Generative%20Adversarial%20Networks%20Tutorial.ipynb

A tutorial explaining GANs: https://realpython.com/generative-adversarial-networks/

A great CycleGAN use case: https://github.com/awentzonline/image-analogies

--

--