As participants of the 2018 Workshop in Machine Learning Applications for Computer Graphics (Cohen-Or, Fogel), we were exposed to many interesting ideas in the fields of artificial intelligence and computervision, such as variational autoencoders (VAE) and deep convolutional generative adversarial networks (DCGAN). In the latter part of the course, we were asked to choose a paper to study and implement. Skimming through articles, we discovered an interesting paper from 2017 titled Age Progression/Regression by Conditional Adversarial Autoencoder Age Progression/Regressionby Conditional Adversarial Autoencoder (Zhang, Song, et al.). The article presented a method to performage modification on a given face image, with exciting utilization from recreational applications to assist the searches of missing children.
The system architecture was written in Python 3.7 and PyTorch 0.4.1, with attempts to keep the code ascompatible as possible with older versions of Python 3 and PyTorch. Other external packages that wereused are NumPy, scikit-learn, OpenCV, imageio and Matplotlib.
The network is comprised of an encoder which transforms RGB images to Z vectors (vectors in a latent space), a generator which transforms vectors to RGB images, a discriminator that measures (and forces) uniform distribution on the encoder's output and a discriminator that measures (and forces) realistic properties on the generator's output.
Encoder with 5 convolutional layers and a fully connected layer. Viewing from left to right, faceimages of dimensions 128×128×3 are transformed into unlabeled Z vectors of size 50 in a latent space.
Generator with 7 deconvolutional layers and a fully connected layer. Viewing from left to right,labeled Z vectors of size 70 in a latent space are transformed into face images of dimensions 128×128×3.
Discriminator on Z with 4 fully connected layers.
Discriminator on images with 4 convolutional layers and 2 fully connected layers.
- Python 3.7
- PyTorch 0.4.1
- Python data schience and graphic packages: NumPy, scikit-learn, OpenCV, imageio and Matplotlib
For training, we use the UTKFace dataset, which was collected by the original authors of the article and tested in their implementation. UTKFace contains over 20,000 aligned and cropped face images withtheir appropriate labels. We wrote a special utility, the UTKFace Labeler, which sorts the dataset images to separated folders based on the label, to match with PyTorch demands that classes are determined by folders structure.
Before training, one random batch of images is separated from the dataset and used for validation, meaning that the network does not back propagate losses on it. We expect that the losses on the validation batch will decrease at each epoch similarly to their change in the rest of the dataset. After every epoch, an image comparing the original validation images with the reconstructed images is saved to the epoch's folder, allowing a human eye to monitor the training session. An example can be seen here:
Original images are on the right and generated images are on the left. It can be seen that centered, frontal images with natural postures reconstruct more accurately than others. Also, rare objects such as glasses, jewelry and watermarks are subdued.
At the end of each epoch, all of the calculated losses are passed to a class we designed, called Loss Tracker. The loss tracker object produces graphs of the changes in losses over epochs and saves them, again to allow a human to analyze and verify the training session. The loss tracker object also enables pre-programmed heuristics to address issues such as overfitting, underfitting, unknown fitting, and drift. It is also possible to watch the graphs update in a new window during training. An example can be seen here:
To start a training session run main.py --mode train <num of epochs> --input <dataset path> --output <results path>
For the full list of options for the training session run main.py --help
We developed a few applications over Jupyter Notebook to test the system with the trained models interactively. As inputs, users can choose between already labeled images from UTKFace, to observe the results with regard to parameters such as age, gender, and race. The applications, referred as Games, can be seen further down this section.
The Aging Game. An input image is fed to the encoder, and the resulted Z vector is fed to the generator ten times, each time with the true gender and a different age group. Then, we present the original image next to all of the output images. The output images can be seen as the aging process of a person, from childhood to old age. We mark the original image and the generated image of the same age group in a white rectangle, for comparison.
The Morph Game. Two input images are fed to the encoder, and the resulted Z vectors are concatenated with their true labels. Then, a set of . The set of vectors is fed to the generator. The output images can be seen as a morphing process from one person to another, where not only the personality features change but also age and gender, allowing to examine concepts such as immediate age transition between age groups and gender fluidity.parison.
The Kids Game. Two input images are fed to the encoder. Then, per each index is generated uniformly in the semi-open range
, and a new Z vector element is generated by
, so that a new Z vector
- Mattan Serry
- Hila Balahsan
- Dor Alt
This project is licensed under the TAU License
- TensorFlow implemenation of of Age Progression/Regression by Conditional Adversarial Autoencoder
- This project ws supported by Amazon Web Services.