cGAN : How to Boss Around Your AI

2025-11-28

The Problem with Vanilla GANs

So, you built a GAN (from my previous post). It generates handwritten digits. Cool!

But here's the catch: You ask it for a number, and it gives you a random 7. You ask again, it gives you a 3. You scream "GIVE ME A 9!", and it calmly hands you a 1.

Vanilla GANs are like a talented but chaotic artist. They can draw, but they don't take requests.

Random BS go

Enter cGANs (Conditional Generative Adversarial Networks).

What is a cGAN?

A Conditional GAN is just a regular GAN that went to obedience school. Instead of just feeding random noise to the Generator, we feed it Noise + A Condition (like a class label).

And we don't just tell the Generator. We also tell the Discriminator: "Hey, check if this image is real AND if it matches this label."

The Analogy

Vanilla GAN: You walk into a coffee shop and say "Give me a drink." You might get coffee, tea, or water. Who knows?
cGAN: You walk in and say "Give me a Coffee." Now, if they give you tea, you know it's wrong, even if it's a perfectly good cup of tea.

The Architecture Change

It's actually a surprisingly simple tweak.

Generator: Takes Noise (z) + Label (y) $\rightarrow$ Generates Image.
Discriminator: Takes Image (x) + Label (y) $\rightarrow$ Real or Fake?

cGAN Architecture Diagram

The Math (Still Don't Panic)

Remember our Minimax game? We just add a little $y$ (the condition) to the probability terms.

\min_G \max_D V(D, G) = \mathbb{E}_{x \sim p_{data}(x)} [\log D(x|y)] + \mathbb{E}_{z \sim p_{z}(z)} [\log (1 - D(G(z|y)))]

Basically, everything is now "given $y$ ".

Code Example: The "Concatenation" Trick

In PyTorch, the easiest way to "feed" the label is to turn it into a vector (using an Embedding) and just glue it (concatenate) to the image or noise.

1. The Generator

We take the noise vector and the label vector, mash them together, and feed that into the network.

class Generator(nn.Module):
    def __init__(self, z_dim, num_classes, img_dim):
        super().__init__()
        self.img_dim = img_dim
        
        # Create an embedding for the label (e.g. digit 0-9)
        self.label_embed = nn.Embedding(num_classes, 10) # Map label to vector of size 10
        
        self.gen = nn.Sequential(
            # Input size is Noise + Label Embedding
            nn.Linear(z_dim + 10, 256), 
            nn.LeakyReLU(0.1),
            nn.Linear(256, img_dim),
            nn.Tanh(),
        )

    def forward(self, noise, labels):
        # 1. Turn the label (e.g., "5") into a vector
        label_vector = self.label_embed(labels)
        
        # 2. Glue it to the noise
        combined_input = torch.cat([noise, label_vector], dim=1)
        
        # 3. Generate!
        return self.gen(combined_input)

2. The Discriminator

The Discriminator needs to see the image AND the label to know if they match.

class Discriminator(nn.Module):
    def __init__(self, img_dim, num_classes):
        super().__init__()
        self.label_embed = nn.Embedding(num_classes, 10)
        
        self.disc = nn.Sequential(
            # Input size is Image + Label Embedding
            nn.Linear(img_dim + 10, 128),
            nn.LeakyReLU(0.1),
            nn.Linear(128, 1),
            nn.Sigmoid(),
        )

    def forward(self, image, labels):
        # 1. Turn label into vector
        label_vector = self.label_embed(labels)
        
        # 2. Glue it to the image
        combined_input = torch.cat([image, label_vector], dim=1)
        
        # 3. Judge it!
        return self.disc(combined_input)

Why is this cool?

Because now you have Control.

In a vanilla GAN trained on faces, you just get "a face". In a cGAN, you could train it on attributes like "Glasses", "Blonde Hair", "Smiling".

Then you can literally type: generate(smiling=True, glasses=True) and boom.

shut up and take my money meme

Conclusion

cGANs are the bridge between "AI doing random cool stuff" and "AI doing what we actually want." It's a small math change, but a huge leap in usability.

Next up? Maybe we'll look at CycleGAN (turning horses into zebras). Stay tuned!

← Back to Home