Using ML to understand the behavior of an AI

Using ML to understand the behavior of an AI

This blogpost is a continuation and exploration of the learnings shared in the previous blogpost Using ML to detect fake face images created by AI but focuses on understanding the features of AI-generated face images and it’s implications.

The three part blog series, describing this project, consist of the following posts:

  1. Using ML to detect fake face images created by AI
  2. Using ML to understand the behavior of an AI
  3. How long and how much to train an ML-classifier.

Head to this simple web app to try out this ML-classifier in action. The ML-model will present you with its prediction of face images you upload for it to classify.



  1. When the original training images for the Nvidia AI-face-generator, the Flickr-Faces-HQ Dataset, was used for training for this ML-classifier to detect if a face image is AI-generated, the model accuracy got slightly worse. A few more AI-generated images got predicted as being from the real-world.
  2. When instead using those original images from Flickr as a Test set, this ML-classifier was doomed with an accuracy of the disastrous 29%. This ought to be the very hardest Test set out there though reaching the conclusion: the hardest images to predict are the original images used to train the AI-generator.
  3. The findings revealed that the Flickr dataset most probably is biased trained toward white, middle-age, round faces with glasses. These are the kind of images that this ML-classifier wrongly predicts as being AI-generated though they actually are from the real-world.

Small recap

This section is a small recap of the previous blog post. Below is a short introduction to what was covered. Go ahead and skip this section if you don’t need the recap. You don’t need it fresh in your mind to be able to follow along.

The previous blog post gives a broad picture and understanding of the experimentation and thought trail while building this ML-classifier to detect AI-generated face images.

I’ve built a classifier to predict whether or not a face image is generated by an AI. And I worked to improve it for production reaching an accuracy of 99,99% on my Test set of unseen images comprising of 202 000 real-world and 30 000 AI-generated images. The training consisted of 43 000 real-world and 30 000 AI-generated images, whereof 20% went into the Validation set which reached an accuracy of 99,21%.

This subject was sparking an interest since Nvidia in December 2018 released a StyleGAN that resulted in an AI-generator that can produce very realistic fake human faces. Scary realistic actually. You can check out the Nvidia AI-generated images for yourself at or have a closer look below at this generated image retrieved from the site, a person who obviously doesn’t exist in flesh and blood:

A very special dataset of images

I had an interesting question poking my curious mind which I really wanted to find out the answer to. In Machine Learning, the journey is all about iterations of experimentation to gain insight and knowledge about a particular problem and/or domain field, so that’s what I went ahead doing.

The question is related to understanding how StyleGAN neural nets work. For a technical but easily digestible walkthrough, head to Towards Data Science and learn how high resolution face images can be produced without loss of quality, by a technique represented by the image collage below. It explains how source images can be used to create new face images, here called destination images.

I wanted to know what would happen if the Flickr-Faces-HQ Dataset got thrown into the game? This dataset of 70 000 real-world face images, was collected by Nvidia from Flickr, and used in the training of a StyleGAN that gave Nvidia its AI to generate the fake face images.

Specifically, the question drills down into two separate parts:

  1. If the ML-classifier would be trained with the Flickr images, would that improve the accuracy? Or to the very least, not downgrade the model?
  2. Is it easier or harder for the ML-classifier to make the prediction that a Flickr image is coming from the real-world (and of course, the classifier not being trained with the Flickr images)?

The original training data for the AI-generator as training data for ML-classifier detection

The accuracy on the Test set (simulating production) for my ML-model not trained with the Flickr images is 99,99%, and it wrongly predicted 17 out of the 30 000 AI-generated images as being real-world faces. As shown below, when 11 000 of the original 70 000 Flickr images (approx. 1/7 of the dataset) were thrown into the training of the ML-classifier, the result stayed more or less the same. The accuracy landed on 99,98% and 53 of the images generated by the AI got wrongly classified as being from the real-world.

Making 53 errors is a few more than 17 though, and gives an indication of the ML-classifier doing slightly worse when the original images that trained the face-generating-AI are used in training to detect AI-generated-face-images. Sorry for the meta language to describe the phenomena, which sounds confusing even to me writing it.

Before moving on, to close up this part of the experimentation, what can’t be said with any confidence, is how the Flickr dataset affected the prediction of the real-world images. A difference between wrongly predicting 7 or 3 images can be just noise in the neural net, which is not due to the learning effect. I wouldn’t say it became better, but from the preliminary look of it, I am satisfied to see that it didn’t get worse.

Prediction of Test set by the ML-classifier after training with the Flickr dataset.
Prediction of Test set by the ML-classifier after training without the Flickr dataset.

The original training data for the AI-generator as Test set for the ML-classifier

Next part of the experimentation was to see what would happen when letting 11 000 of the images in the Flickr dataset become the Test set. This required a reset of the previous part of the experimentation and swishing back to the version of the ML-classifier that did not train with the Flickr dataset.

I could not beforehand make up in my mind what actually would happen with the prediction. Can you? Now is time for you to take a moment and run through the possibilities in your head to make a qualified guess if you like. I thought it could go either way. Below is a sample from the Flickr-Faces-HQ Dataset.

Maybe the AI-generated images all have the same features as the original real-world images from the Flickr dataset? Then, that could make this classifier predict these particular real-world images as being generated by AI and therefore failing miserably. The predicting accuracy on the Test set would then score very low.

Or, the features represented by the Flickr dataset are the features that my ML-classifier had been seeing, and able to sort out, while training on the AI-generated images. If this is the case, it would be easy to detect these features on the Test set resulting in an accuracy that improves or stays the same as the previously attained accuracy landed on 99,99%.

Ready to find out how it went?

Below is the Confusion Matrix showing the result of the experiment. Shown is also the Top Losses of the classifier, meaning, the images that the model was most confused about.

The very hardest Test set

Ohoh, without a doubt, this was very very hard for the classifier. 29% accuracy in prediction!

The conclusion must be:

the hardest images to predict are the original images used to train the AI-generator.

The logic which follows is that the features the AI-generator is using to create new face images are very specifically represented by the original features of the Flickr dataset. One speculation about the images representing the Top Losses shown above, is that those facial features we see, are the features that the AI-generator prefers to use when creating fake faces. I don’t know about you, but I see a common dominator: white, middle age, rounder faces. And glasses, there is a heavy overrepresentation of faces with glasses that this ML-classifier think are fake and generated by AI.

Implications of fake images generated by the current AI technology

Let’s pause for a moment to think about the implications. Ethics in tech is very close to my heart. Here we see it at play: what we put in, is what we get out.

This might probably mean that Nvidia’s StyleGAN AI-generator will generate images where under-represented faces will be missing. Or perhaps be messing around with putting features of a white man’s eyes into a black man’s face. This could be tested with this ML-classifier I’ve created. If for example, we decide to run a Test set with only faces of, let’s say, Ecuadorian people, if the accuracy becomes far worse compared to the 99,99% accuracy on the final Test set of 202 000 celebrities, then we would know there are biases built into Nvidia’s face AI-generator.

It’s been stressed by many, both within the Data Science community and outside of the field, that for example, the current state-of-the-art face recognition software using AI and ML-algorithms is systematic biased. For white males, the accuracy of prediction is 99%, while for white females it’s only capable of 93%. And most disturbing is that only 65% of dark skin females are being identified correctly (* see Disclaimer below). The Flickr dataset has most probably not been selected for representing the human race with equal weights. This happens more often than any one of us wants when Data Scientists are bound to use the already available data out in the public. It’s not about purposely not doing the right thing, but it’s many times hard to do the right thing if your choices are limited with the resources at your hand. Something we all should have in the back of our head though while navigating through the datasets we use to train our ML and AI algorithms.

Some critics point out that these numbers stated representing the built-in biases are biased in itself since it depends on how you do the calculations. I agree, maybe it’s not 99,99% vs 93% vs 65%. But we do know we have collectively collected enough of data and evidence to know it is not equal no matter how you do the calculations.

The End

To offer some final conclusion about this ML-classifier, I’ll circle back to the previous blog post and the result of a 99,99% accuracy for this classifier on the final Test set, my production simulated environment.

The final distribution between AI-generated and real-world face images landed on the percentage of 30/70. My intuition guided me along the way, giving me a feeling that the core of the problem at hand to be solved has been to present the ML-model with pattern Learnings about real-world images. It was much later in the experimental process, when seeing how the classifier failed so miserably with the original real-world images from the Flickr dataset, that hard data to support this initial gut feeling emerged.

After this experiment, another possible future improvement may be added, to make the final ML-classifier perform even better. More focus can be placed on training the model to pick up on images with similar features (white, middle-aged, round faces with glasses) as the Flickr dataset, where obviously close to 2/3 were predicted as being fake and AI-generated.

Last thing, another fun experimentation to do, but I didn’t, would be to improve the ML-classifier to predict other AI-generated images besides faces, such as say: cars or cats. As for now, the ML-classifier is predicting 10/10 AI-generated cats that I fed it with, as almost 100% certain it comes from the real-world. This tells that my classifier knows this isn’t an AI-generated face, and all the rest is put into the category of real-world images. If you are not mindful of your time, the Internet can easily consume you. Here is a place showcasing many more things a GAN network can generate for you, if you dare and are curious: This X Does Not Exist.

At this point though, I feel content. And very proud of my result. Next ML adventure is waiting. And who could resist sending you off with anything less than a cute cat image. This one, a fake one.

Here I’m signing off,
for the Jayway Blog,

Silvia Man,
Senior software engineer


The classifier I’ve built is a variation of the Fastai Lab talked about in the course Deep Learning for Coders in Lesson 2 which I expanded quite a bit beyond the scope into the experimentation talked about in this blog post.

Face datasets for training (and validation)

AI-generated – a total of 33 000 images

  1. The 1 million Fake Faces dataset – used 30 000 images
  2. The Generated Data Dataset – used 3 000 images

Real-world in the wild – total 43 000 images / 54 000 images

  1. The UTKFace dataset – used 24 000 images
  2. The LFW dataset – used 13 000 images
  3. The Large Age Gap Face Verification Dataset – used 3 800 images
  4. The Real and Fake Face Detection Dataset – used 2 x 1000 images, both sets with real-world faces in the wild but one of the sets were Photoshopped
  5. Flickr-Faces-HQ Dataset – used 11 000 images. Was used by Nvidia when training a GAN network to produce the AI-generator that generated the face images for The 1 million Fake Faces dataset

Face dataset for test (indication of production quality)

AI-generated – a total of 30 000 images

  1. The 1 million Fake Faces dataset – used (another set of) 30 000 images

Real-world in the wild – a total of 202 000 images

  1. The CelabFaces Attributes dataset – used 202 000 images


  1. The 1 million Fake Faces dataset
    StyleGAN algorithm and model by NVIDIA under CC BY-NC 4.0
  2. The Generated Data Dataset
    Photo by Generated Photos
  3. The UTKFace dataset
    The UTKFace dataset is available for non-commercial research purposes only. The copyright belongs to the original owners.
  4. The LFW dataset
    Labeled Faces in the Wild is a public benchmark for face verification.
  5. The Large Age Gap Face Verification Dataset
    author = {Bianco, Simone},
    year = {2017},
    pages = {36-42},
    title = {Large Age-Gap Face Verification by Feature Injection in Deep Networks},
    volume = {90},
    journal = {Pattern Recognition Letters},
    doi = {10.1016/j.patrec.2017.03.006}}
  6. The Real and Fake Face Detection Dataset
    Available at Kaggle, License unknown, Visibility public
  7. Flickr-Faces-HQ Dataset
    The individual images were published in Flickr by their respective authors under either Creative Commons BY 2.0, Creative Commons BY-NC 2.0, Public Domain Mark 1.0, Public Domain CC0 1.0, or U.S. Government Works license. All of these licenses allow free use, redistribution, and adaptation for non-commercial purposes. However, some of them require giving appropriate credit to the original author, as well as indicating any changes that were made to the images. The license and original author of each image are indicated in the metadata.
  8. The CelabFaces Attributes dataset
    title = {Deep Learning Face Attributes in the Wild},
    author = {Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
    booktitle = {Proceedings of International Conference on Computer Vision (ICCV)},
    month = {December},
    year = {2015}}

Leave a Reply