Deep Learning Specialization - Coursera
main
main
  • Introduction
  • Neural Networks and Deep Learning
    • Introduction to Deep Learning
    • Logistic Regression as a Neural Network (Neural Network Basics)
    • Shallow Neural Network
    • Deep Neural Network
  • Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization
    • Practical Aspects of Deep Learning
    • Optimization Algorithms
    • Hyperparameter Tuning, Batch Normalization and Programming Frameworks
  • Structuring Machine Learning Projects
    • Introduction to ML Strategy
    • Setting Up Your Goal
    • Comparing to Human-Level Performance
    • Error Analysis
    • Mismatched Training and Dev/Test Set
    • Learning from Multiple Tasks
    • End-to-End Deep Learning
  • Convolutional Neural Networks
    • Foundations of Convolutional Neural Networks
    • Deep Convolutional Models: Case Studies
      • Classic Networks
      • ResNets
      • Inception
    • Advice for Using CNNs
    • Object Detection
      • Object Localization
      • Landmark Detection
      • Sliding Window Detection
      • The YOLO Algorithm
      • Intersection over Union
      • Non-Max Suppression
      • Anchor Boxes
      • Region Proposals
    • Face Recognition
      • One-Shot Learning
      • Siamese Network
      • Face Recognition as Binary Classification
    • Neural Style Transfer
  • Sequence Models
    • Recurrent Neural Networks
      • RNN Structure
      • Types of RNNs
      • Language Modeling
      • Vanishing Gradient Problem in RNNs
      • Gated Recurrent Units (GRUs)
      • Long Short-Term Memory Network (LSTM)
      • Bidirectional RNNs
    • Natural Language Processing & Word Embeddings
      • Introduction to Word Embeddings
      • Learning Word Embeddings: Word2Vec and GloVe
      • Applications using Word Embeddings
      • De-Biasing Word Embeddings
    • Sequence Models & Attention Mechanisms
      • Sequence to Sequence Architectures
        • Basic Models
        • Beam Search
        • Bleu Score
        • Attention Model
      • Speech Recognition
Powered by GitBook
On this page
  • The Content Cost Function
  • The Style Cost Function

Was this helpful?

  1. Convolutional Neural Networks

Neural Style Transfer

PreviousFace Recognition as Binary ClassificationNextSequence Models

Last updated 4 years ago

Was this helpful?

Neural Style Transfer is the process of using a convolutional neural network to transfer the style of one image (style image S) to another image (content image C) and generate a new image G.

It is used by several popular apps such as Prisma.

Neurons in the layers of a CNN learn to detect different patterns. Neurons in deeper layers learn to identify more sophisticated patterns than those in shallower layers.

The following cost function is used for neural style transfer:

J(G)=αJcontent(C,G)+βJstyle(S,G)J(G) = \alpha J_{content}(C, G) + \beta J_{style}(S, G)J(G)=αJcontent​(C,G)+βJstyle​(S,G)

The first term (content cost) determines how similar the content of G is to that of C and the second term (style cost) determines how similar the style of G is to that of S.

To generate G, first we create G and initialize random values for its pixels. Then, we perform gradient descent on J(G), thereby updating the pixel values of G to get the required G after style transfer.

The Content Cost Function

Say we use a hidden layer l (of a pre-trained CNN) to compute the content cost. (l is usually one of the middle hidden layers; neither too shallow nor too deep).

If a[l](C)a^{[l](C)}a[l](C) and a[l](G)a^{[l](G)}a[l](G) denote the activations of layer l for the images C and G respectively, we conclude that the images C and G have similar content if a[l](C)a^{[l](C)}a[l](C) and a[l](G)a^{[l](G)}a[l](G)are similar.

The content cost function is given by:

Jcost[l](C,G)=12∣∣a[l](C)−a[l](G)∣∣2J_{cost}^{[l]}(C, G) = \frac{1}{2} ||a^{[l](C)}-a^{[l](G)}||^2Jcost[l]​(C,G)=21​∣∣a[l](C)−a[l](G)∣∣2

The Style Cost Function

Say we use hidden layer l's activation to measure style. But what exactly is style?

We define style as the correlation between activations across channels.

Let ai,j,k[l]a_{i, j, k}^{[l]}ai,j,k[l]​ be the activation at (i, j, k). We compute a nc[L]×nc[L]n_c^{[L]}\times n_c^{[L]}nc[L]​×nc[L]​ style matrix G[L]G^{[L]}G[L] for each image (S and G) with elements denoting the correlations of activations across channels. (k, k'=1, 2, ..., nc[l]n_c^{[l]}nc[l]​).

Gkk′[l]=∑i=1nh[l]∑j=1nw[l]ai,j,k[l]ai,j,k′[l]G_{kk'}^{[l]} = \sum_{i=1}^{n_h^{[l]}}\sum_{j=1}^{n_w^{[l]}} a_{i,j,k}^{[l]} a_{i,j,k'}^{[l]}Gkk′[l]​=∑i=1nh[l]​​∑j=1nw[l]​​ai,j,k[l]​ai,j,k′[l]​

Now, we will have two style matrices G[l](S)G^{[l](S)}G[l](S)and G[l](G)G^{[l](G)}G[l](G). The style cost function for layer l is given by:

Jstyle[l](S,G)=1(2nh[l]nw[l]nc[l])2∣∣G[l](S)−G[l](G)∣∣F2=1(2nh[l]nw[l]nc[l])2∑k=1nc[l]∑k′=1nc[l](Gkk′[l](S)−Gkk′[l](G))2J_{style}^{[l]}(S, G) = \frac{1}{(2n_h^{[l]}n_w^{[l]}n_c^{[l]})^2}||G^{[l](S)} - G^{[l](G)}||_F^2 = \frac{1}{(2n_h^{[l]}n_w^{[l]}n_c^{[l]})^2}\sum_{k=1}^{n_c^{[l]}}\sum_{k'=1}^{n_c^{[l]}}(G_{kk'}^{[l](S)} - G_{kk'}^{[l](G)})^2Jstyle[l]​(S,G)=(2nh[l]​nw[l]​nc[l]​)21​∣∣G[l](S)−G[l](G)∣∣F2​=(2nh[l]​nw[l]​nc[l]​)21​∑k=1nc[l]​​∑k′=1nc[l]​​(Gkk′[l](S)​−Gkk′[l](G)​)2

The overall style cost function is as follows:

Jstyle(S,G)=∑lλ[l]Jstyle[l](S,G)J_{style}(S, G) = \sum_l \lambda^{[l]} J_{style}^{[l]}(S, G)Jstyle​(S,G)=∑l​λ[l]Jstyle[l]​(S,G)

where λ[l]\lambda^{[l]}λ[l] is a hyperparameter for layer l.