Neural Style Transfer

Neural Style Transfer is the process of using a convolutional neural network to transfer the style of one image (style image S) to another image (content image C) and generate a new image G.

It is used by several popular apps such as Prisma.

Neurons in the layers of a CNN learn to detect different patterns. Neurons in deeper layers learn to identify more sophisticated patterns than those in shallower layers.

The following cost function is used for neural style transfer:

$J(G) = \alpha J_{content}(C, G) + \beta J_{style}(S, G)$

The first term (content cost) determines how similar the content of G is to that of C and the second term (style cost) determines how similar the style of G is to that of S.

To generate G, first we create G and initialize random values for its pixels. Then, we perform gradient descent on J(G), thereby updating the pixel values of G to get the required G after style transfer.

The Content Cost Function

Say we use a hidden layer l (of a pre-trained CNN) to compute the content cost. (l is usually one of the middle hidden layers; neither too shallow nor too deep).

If $a^{[l](C)}$ and $a^{[l](G)}$ denote the activations of layer l for the images C and G respectively, we conclude that the images C and G have similar content if $a^{[l](C)}$ and $a^{[l](G)}$ are similar.

The content cost function is given by:

$J_{cost}^{[l]}(C, G) = \frac{1}{2} ||a^{[l](C)}-a^{[l](G)}||^2$

The Style Cost Function

Say we use hidden layer l's activation to measure style. But what exactly is style?

We define style as the correlation between activations across channels.

Let $a_{i, j, k}^{[l]}$ be the activation at (i, j, k). We compute a $n_c^{[L]}\times n_c^{[L]}$ style matrix $G^{[L]}$ for each image (S and G) with elements denoting the correlations of activations across channels. (k, k'=1, 2, ..., $n_c^{[l]}$ ).

$G_{kk'}^{[l]} = \sum_{i=1}^{n_h^{[l]}}\sum_{j=1}^{n_w^{[l]}} a_{i,j,k}^{[l]} a_{i,j,k'}^{[l]}$

Now, we will have two style matrices $G^{[l](S)}$ and $G^{[l](G)}$ . The style cost function for layer l is given by:

$J_{style}^{[l]}(S, G) = \frac{1}{(2n_h^{[l]}n_w^{[l]}n_c^{[l]})^2}||G^{[l](S)} - G^{[l](G)}||_F^2 = \frac{1}{(2n_h^{[l]}n_w^{[l]}n_c^{[l]})^2}\sum_{k=1}^{n_c^{[l]}}\sum_{k'=1}^{n_c^{[l]}}(G_{kk'}^{[l](S)} - G_{kk'}^{[l](G)})^2$

The overall style cost function is as follows:

$J_{style}(S, G) = \sum_l \lambda^{[l]} J_{style}^{[l]}(S, G)$

where $\lambda^{[l]}$ is a hyperparameter for layer l.

PreviousFace Recognition as Binary Classification NextSequence Models

Last updated 4 years ago

The Content Cost Function

Say we use a hidden layer l (of a pre-trained CNN) to compute the content cost. (l is usually one of the middle hidden layers; neither too shallow nor too deep).

a^{[l](C)}

and

a^{[l](G)}

denote the activations of layer l for the images C and G respectively, we conclude that the images C and G have similar content if

a^{[l](C)}

and

a^{[l](G)}

are similar.

The content cost function is given by:

J_{cost}^{[l]}(C, G) = \frac{1}{2} ||a^{[l](C)}-a^{[l](G)}||^2

The Style Cost Function

Say we use hidden layer l's activation to measure style. But what exactly is style?

We define style as the correlation between activations across channels.

Let

a_{i, j, k}^{[l]}

be the activation at (i, j, k). We compute a

n_c^{[L]}\times n_c^{[L]}

style matrix

G^{[L]}

for each image (S and G) with elements denoting the correlations of activations across channels. (k, k'=1, 2, ...,

n_c^{[l]}

G_{kk'}^{[l]} = \sum_{i=1}^{n_h^{[l]}}\sum_{j=1}^{n_w^{[l]}} a_{i,j,k}^{[l]} a_{i,j,k'}^{[l]}

Now, we will have two style matrices

G^{[l](S)}

and

G^{[l](G)}

. The style cost function for layer l is given by:

J_{style}^{[l]}(S, G) = \frac{1}{(2n_h^{[l]}n_w^{[l]}n_c^{[l]})^2}||G^{[l](S)} - G^{[l](G)}||_F^2 = \frac{1}{(2n_h^{[l]}n_w^{[l]}n_c^{[l]})^2}\sum_{k=1}^{n_c^{[l]}}\sum_{k'=1}^{n_c^{[l]}}(G_{kk'}^{[l](S)} - G_{kk'}^{[l](G)})^2

The overall style cost function is as follows:

J_{style}(S, G) = \sum_l \lambda^{[l]} J_{style}^{[l]}(S, G)

where

\lambda^{[l]}

is a hyperparameter for layer l.