convolutional neural nate @onemoredeeper - Tumblr Blog

Implementing “Deep Dream”

Google’s iconic “deep dream” image visualizations were the first exposure I, and a lot of other people, got of the creepy and fascinating potential of deep learning. A while ago, I set out to implement them myself in TensorFlow.

I heavily based my approach on this Deep Dream notebook from the TensorFlow project, and copied some subroutines from it.

Like neural style transfer, deep dreaming plays with feature representations learned by pre-trained image recognition networks. When we’re running a deep image recognition network, each layer transforms the image into a higher and higher-level “feature representation.” The lowest (input) level is raw pixel color, which is followed by edges, corners and gradients, then small recognizable features (e.g. eyes), and then progressively-larger high level features that help the network recognize what it’s trying to recognize.

A convolutional “feature hierarchy.” (Nvidia)

The idea behind deep dreaming is to pick a mid-level feature (one of the squares in the feature hierarchy) and use backpropogation on the input image to make the network “see” this feature more.

Deep Dream employs a “Laplacian Pyramid” approach to this problem: the input image is resized to various scales, the derivative of the chosen feature is computed by feeding it into the network, and the image is adjusted slightly. This is repeated a couple hundred times, and you end up with images like this:

Maximizing a high-level feature from the GoogLeNet image classifier. When GoogLeNet is recognizing images, a high value for this feature probably causes the network to see a dog.

Maximizing different features from the same layer of the network — each is trained to pick up a different texture.

The pre-trained VGG network I found used max-pooling, but I found that Deep Dream improves significantly in quality if max-pooling layers are replaced with average-pooling (left = max, right = average)

Running for many iterations with a high step size causes very severe distortion, but makes for a cool image.

[🐶 See 🐶 the 🐶 code 🐶 here 🐶]

#deep dream #convolutional neural networks

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

Classifying Food

For a project, I’m looking for an algorithm to take a photo and detect what food is in it. I decided to try training my own neural network. I’ve been using and reading about large-scale image classifier networks like AlexNet and VGG for a long time, but I’ve never actually trained a large multi-class image classifier from scratch before.

Finding a dataset

Getting lots of labelled images of food isn’t something I can easily do myself, but I was lucky enough to be able to find a couple large datasets. UEC Food 256 looks promising — several gigabytes of images, labelled as one of 256 large categories. The catch: the project is meant to be deployed in an American college dining hall, but most of the UEC 256 image classes are Japanese dishes.

Luckily, the Food 101 dataset also exists. It’s a compilation of much more Western-centric dishes, compiled by ETH Zurich. It has 1000 images each of 101 food items, and the website links to a paper that achieves ~50% accuracy with a 75-25% test split.

Approach #1: Building off an existing classifier

The TensorFlow documentation has an article called How to Retrain Inception's Final Layer for New Categories which describes how to use an existing classifier, trained on ImageNet, to quickly train a classifier on a new task.

You load an existing network, feed in your input images, and extract an intermediate layer (a “bottleneck”) and then feed these bottleneck activations as input into a shallower, simpler classifier that’s specific to your task. The intuition is, if the original classifier is general enough, it’ll have done the heavy lifting of learning high-level features that are useful for classification — and even if they were originally trained to be useful for ImageNet classification, they’re probably useful for classifying food, too.

I wrote some code to do this. It took quite a long time to run Keras’ pre-trained Inception v3 model and extract a 2048-unit bottleneck layer for each image, so I skipped the last 30 categories.

In Keras, I trained a simple two-layer fully-connected model to classify the images into those 70 categories. I got up to around 10% accuracy in a couple epochs (training on just a CPU!) but then began overfitting — the test accuracy plateaud at 10% while the training accuracy continued to climb.

I wouldn’t write off this approach, though — I think I may have selected a layer too close to the original classifier’s final output. ImageNet classifies 1000 categories, and I selected a layer at which the image had already been reduced to 2048 — this layer’s representation of the image may be too ImagetNet-specific.

[😢 Code for this approach 😰]

Approach 2: Building a classifier from scratch

The Food 101 dataset seemed big enough, so I figured I’d try to train my own classifier from scratch. I’d read a lot about large-scale image classifiers before, but I’d never implemented one, so this was an interesting experience.

Designing a model architecture

I spent some time reading about various ImageNet classifier architectures. Big, simple models like VGG, with 16 layers of small convolutions and max-pooling, do well, but the resulting models have so many parameters that they’re huge, and take a long time to train. (and a lot of data!)

Luckily, people have spent a lot of time developing more complex models that do more with less. In Inception architecture from Google caught my attention. They designed an “Inception module” that feeds intermediate image representations through several different convolution schemes, then concatenates the output. Each layer contains a 1x1 convolution, a 1x1 convolution followed by a 3x3 convolution, average-pooling plus a 1x3 convolution, and a deeper 1x1-3x3-3x3 operation. The intuition is that different image features need to be processed with varying complexity — the Inception module allows simpler features to pass through a single 1x1 convolution, but provides deeper processing for more complex ones. Another crucial strategy used by Inception is to use 1x1 convolutions to reduce the dimensionality of the image before applying larger-kernel convolutions, since the latter use more parameters.

An Inception module

I fed my classifier 64x64x3 color images, and passed them through a number of Inception modules:

The resulting architecture seemed to work pretty well. After training for ~6 epochs, it achieved a 37% top-1 accuracy rate on the 100 categories. It hasn’t converged yet, and I’d like to see what I’ll get after training it for about a day. (I’m using a costly VM and don’t have a huge time allowance, so I haven’t done this yet.)

A couple more details about this architecture:

- softmax cross-entropy loss

- batch size 16

- Adam optimizer with a learning rate of 0.0001

- Number of channels doubles after each pooling operation

- 50% dropout applied after every even-numbered pooling operation

🍞 Code for this approach 🍉

#classification #imagenet #food #inception #deep

How to train your GAN

Strategies for training generative models

If you’ve heard about algorithms that generate photorealistic images from scratch — human faces wearing fake expressions, daytime photos that look like night, cat sketches that are turned into realistic cat images — you’ve heard of GANs.

Generative Adversarial Networks, which I described a couple posts ago while trying to generate real-looking faces from random vectors of numbers, have huge potential. But there’s a catch, of course. They’re fiendishly hard to train.

In this post, I’ll write a bit about my experience training them, trying out the latest GAN-training fad diets. I’ll also link to better resources to help you train your GAN.

Why are GANs so awful?

Depending on your GAN’s hyperparameters, your GAN might train flawlessly – more likely, though, is that you’ll fall victim to one of GAN’s well-documented failure modes, and you won’t learn anything.

You might get stuck with mode collapse. Your generator’s job is to produce outputs that fool the discriminator. It has a little conversation in its head, that goes like this:

Woah! This photo is easy to draw but the discriminator is totally fooled by it. Sucker! My work here is done! I’ll just output this image, every single time, regardless of the random vector I’ve been sent.

⏰ Five minutes later 🕟

Fuck! Looks like the discriminator caught on about this image being fake! Gotta make the image a little different and keep pumping it out.

Because the generator doesn’t output diverse samples, the discriminator doesn’t learn much useful, except to flag the specific image (“mode”) that the generator’s currently outputting as fake. I don’t have a solid mathematical intuition for what exactly triggers the generator to start producing a bunch of very similar images, and I’d like to know.

There are plenty of supposed solutions to mode collapse, and I’ll talk about them in a bit.

Another common failure mode is vanishing gradients. This one’s pretty simple – the discriminator gets really good, faster than the generator, and is able to mark the generator’s output as fake with super-high probability. If your discriminator is marking fakes as 99.9999% fake, there’s only a tiny sliver of “realness.” The generator is trained to increase the “realness” of its samples according to the discriminator — if all we have is a tiny sliver of realness, the gradients are tiny and the training is week. We end up in a sad, sad world where the poor generator doesn’t have the feedback (in the form of gradients) to learn much of anything. You’ll see tiny discriminator loss, and huge generator loss.

WGAN to the rescue

In early 2017, three researchers published a paper introducing the Wasserstein GAN, claiming it alleviated all mode-collapse issues and made training GANs significantly more stable.

Most of the paper is spent establishing a theoretical model of why GANs are difficult to train. In GANs, we attempt to train the generator to match the probability distribution of the real data. To accomplish this with gradient descent, we need to minimize some “distance metric” between the generator’s current output distribution and the real data distribution. The WGAN paper interprets the traditional GAN algorithm as using KL divergence as a distance metric, which isn’t smoothly differentiable at most points, making it really difficult to do successfully.

They suggest a different metric — “earth mover distance” — which is differentiable everywhere, and offers good gradients. The incredible thing about the paper is that changing the GAN algorithm the be equivalent to minimizing earth-mover distance requires just three weird tricks:

rather than outputting classification probabilities (using something like softmax cross-entropy real/fake), the discriminator should output numbers, which can be as large as possible. train this discriminator — they call it a critic instead — to return a high positive number for real inputs, and a large negative number for fake inputs.

clip the weights of the gradient after each training iteration — the paper’s authors suggest clipping weights between [-0.01, 0.01]

rather than trying to balance generator/discriminator training, just train the critic to convergence before you start training the generator — the critic should still give good gradients, even if it’s really strong.

I played around with implementing a WGAN to generate synthetic samples based on MNIST digits and CelebA human faces.

Generating MNIST digits was a breeze — without any hyperparameter tuning, WGAN generated great samples: [📸 See the code]

CelebA didn’t work too well. The generator didn’t collapse and give terrible results — the sample quality clearly improved over time, but training was incredibly slow, far slower than a DCGAN. Here’s what I got before I stopped training:

Least-squares GAN

Another paper promising an improved GAN model is the “Least Squares Generative Adversarial Networks.” This post by Augustinus Kristiani gives a good overview of the paper, and makes some bold claims — that it’s as stable as WGAN but not nearly as slow, and also generates higher-quality samples.

The basic idea is even simpler than WGAN’s – rather than using softmax cross-entropy classification loss in the discriminator, use least-squares predictions instead. (i.e. train the discriminator to output 1 for real samples and -1 for fake samples, and train using L2 loss.) This makes sense because it forces the discriminator to output reasonably-sized numbers like -1 and 1, no matter how “good” or “confident” it is, which should give better gradients than a discriminator trained to convergence using softmax cross-entropy.

I wasn’t able to achieve particularly miraculous results using LSGAN on CelebA — I needed a bit of hyperparameter optimization to get anywhere reasonable, and my generator still tended to “collapse” occasionally and stop outputting good images.

That being said, the recent (and super cool) paper “Unsupervised Image-to-Image Translation Networks” uses least-squares loss instead of the traditional GAN formulation, so it’s clearly useful.

Improved WGAN (with Gradient Penalty)

A recent paper finds theoretical issues with one of the tricks WGAN uses to work. WGAN uses ‘gradient clipping’ to enforce a ‘Lipschitz constraint’ on the critic parameters (I have no idea what this means). The paper suggests that gradient clipping is a suboptimal way to enforce Lipschitz-ness, and ends up biasing the critic towards simpler models of the true distribution. Instead of clipping gradients, they suggest augmenting the critic loss function to encourage the critic’s gradients to be close to 1 with respect to input images.

Implementing gradient clipping isn’t that difficult — I was able to get an improved WGAN working on MNIST pretty quickly — 🎒 here’s the code for that.

Other tricks

There are lots and lots of people offering strategies for improving GAN training and stability. Many of these work for WGAN and LSGAN as well, but may not be as useful, given those algorithms’ stability promises.

Here are some I find interesting:

Store a ‘replay buffer’ of previous generator outputs. Occasionally, rather than training the discriminator on the latest generator outputs, train it on some old generator outputs — it should still know they’re fake!

Using various different optimizers in the discriminator, rather than Adam (which is usually my go-to) — apparently, momentum might cause instability

“Principled” attempts to balance discriminator and generator strength. I’ve tried keeping track of discriminator accuracy, and stop training the discriminator while its accuracy is < 80%, giving the generator a chance to “catch up.”

There are plenty of GAN-training resources that provide more (and better motivated) tricks than these — here are some of my favorites:

GAN Hacks by Soumith Chintala

Improved GANs from OpenAI

#gan #generative models #wgan #lsgan #DCGAN

Seamless textures from a single image: style transfer and SGAN

If you’re looking for an image to use as the background of a website or a texture on a 3D model, chances are, you want it to be seamless — if you tile a bunch of instances of the same image as a grid, there won’t be any seams or discontinuities near the edges.

Making this images isn’t easy. But what if we could synthesize repeating images entirely algorithmically, using an existing image as a guide to specify the type of texture we’re looking for.

Gatys et al’s neural style transfer paper suggests that you can synthesize textures with the ordinary neural style transfer algorithm, which I implemented earlier. The trick is to skip specifying a content image input – just ask the network to take a blank canvas and optimize it to match the style of another image.

I modified this to produce seamless images by adding an additional constraint that the boundaries of the image, when tiled side-by-side, also match the style of the source image.

This actually works decently well, and produced a couple of interesting textures:

These images don’t have jarring discontinuities at the boundaries, but they do have some not-so-great quirks — the frequencies of the textures seem to die down near the edges, rather than producing a robust texture that crosses the boundary:

A different approach

I wrote earlier about implementing a convolutional generative adversarial network (GAN) to generate images. These models are good at generating images because they’re trained to distinguish between their own generated images and the “real” images from a dataset, and simultaneously trained to fool this “discriminator.”

I stumbled across a paper describing a variant of GAN called Spatial Generative Adversarial Networks, which claims to be able to generate arbitrarily large textures, as well as repeating textures. It’s like the DCGAN I described before, but with a couple important differences:

- rather than using a random vector as the “seed” for the generator, it takes in a 2D image of random noise, and translates this into a larger, detailed image

- the output of the discriminator is also an image, describing how locally realistic each part of the image is

- there are no fully-connected layers, only convolutions

These changes make it fully convolutional — rather than operating on fixed-size images, it operates by applying convolutional filters across arbitrarily-sized images.

By feeding in arbitrarily large random noise images, you can generate arbitrarily large textures. But here’s the trick: you can make a random noise image, tile it 2x2, feed it into the network, and take the middle 50% of the output image — it’ll be seamless, since both sides were generated from the same noise.

GANs are really hard to train, so it took a long time to find the right hyperparameters for this network — once I did, it actually seemed to work decently well:

🌴 Style transfer-based code

👾 SGAN code

#seamless #texture #tileable #dcgan #gan #style transfer #texture synthesis

Generating fake human faces with DCGAN

Recent advances in generative adversarial networks have made it possible to feed a neural network a dataset of images (e.g. photos of faces) and generate images that look like they belong to the dataset, but don’t.

The Deep Convolutional Generative Adversarial Network (DCGAN) paper outlines a technique for generating medium-sized images (32x32 or 64x64 pixels, usually). Like other GAN architectures, DCGAN trains two neural networks in tandem — one network generates images (it starts, initially, by producing random noise) and another network tries to tell real images from fake ones. As this discriminator improves, the generator is trained to fool it by back propagating through the discriminator. Ideally, as the generator gets good at fooling the discriminator, the discriminator improves, forcing the generator to produce ever higher-quality fakes.

I trained a modified DCGAN architecture on the CelebA dataset — mine diverges from DCGAN as it uses average-pooling in the discriminator, instead of strided convolutions, but the effect is the same.

After a lot of hyperparameter tuning (GANs are notoriously difficult to train in a stable way), I got it to work! After a couple hours, I was generating images that looked like this:

Clearly fakes, but clearly doing a decent job of emulating the real dataset.

Since the generator takes in random noise vector as a seed for the generated image, it’s possible to explore the latent space by feeding in latent vectors that vary only a little bit. I created a GIF by interpolating the latent vector and capturing the generator’s output:

😉 😱 Code here 🙃 😡

#generative #dcgan #celeba #faces #spooky

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

prostheticknowledge

Give your kids futuristic names with a neural network!

Medium post from Nate Parrott uses neural networks to generate new first names of the future:

We live in the future. Computers drive cars, fight parking tickets and raise children. Why not let machines name our children, too? What if a computer program could find the ideal baby name. Maybe it’s a perfect combination of both parents’ names—or maybe it’s a name that’s completely unique.

I trained a neural network on a list of 7500 popular American baby names, forcing it to turn each name into a mathematical representation called an embedding. Once I had a model that could translate between names and their embeddings, I could generate new names, blend existing names together, do arithmetic on names, and more.

More Here

Use deep neural nets to add, subtract, mix and generate baby names.

Trained an LSTM-based variational autoencoder on the top 7500 American baby names, and did some cool things with it.

#autoencoder #generative #lstm

Generating fake handwritten MNIST digit using variational autoencoders

I'm interested in generative models that use neural networks to produce new content. I took the MNIST dataset and trained a variational autoencoder — a neural network that tries to compress its input image into a vector of numbers and then reconstruct it. You can do arithmetic on these latent vectors — for example, "blending" two digits together by averaging their vectors. Latent vectors can also be sampled from randomly to generate random digit-like images.

Some original MNIST digits (top) and the network's reconstruction of them, after compressing them down to vectors of 10 floating-point numbers.

A 0 is transformed into a 2 by taking weighted sums of their vectors, then running the "reconstruction" step to produce an image.

Some images taken by randomly generating latent vectors and reconstructing an image from them.

#mnist #handwriting #autoencoder #generative

Implementing neural style transfer

Using a pre-trained network trained to classify images from the ImageNet dataset, I implemented the Neural Algorithm for Artistic Style paper (which is similar to what's used by the photo-filtered algorithm Prisma, but isn't the same). The algorithm's underlying assumption is that the activations of earlier layers of a convolutional neural network represent the "stylistic" aspect of an image — the details of strokes, colors used, etc; and that the later, higher-level layers represent the high-level "content" of an image. I used gradient descent to generate an image that where low-level feature activations were similar to activations from the "style image," while the higher-level feature activations stayed similar to the "content" image.

A "content" image, a "style" image, and the mixture of both images using style transfer.

CODE💻IS💻HERE

#style transfer

Recognizing real-world images from CIFAR-10 using convolutional networks

I loosely followed the TensorFlow CIFAR-10 tutorial, using a simple convolutional neural network to classify 32x32 images into one of 10 categories (cats, deer, airplanes, etc). Running on a GPU for less than an hour, it achieved a 75% accuracy.

A few of the 60,000 images from CIFAR-10 – 32x32-pixel images labelled with one of ten categories.

and here’s the code 🌴

#convolutional neural networks #supervised #image recognition

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

Trending Blogs

Last Seen Blogs

convolutional neural nate