Deep Learning Specialization - Coursera
main
main
  • Introduction
  • Neural Networks and Deep Learning
    • Introduction to Deep Learning
    • Logistic Regression as a Neural Network (Neural Network Basics)
    • Shallow Neural Network
    • Deep Neural Network
  • Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization
    • Practical Aspects of Deep Learning
    • Optimization Algorithms
    • Hyperparameter Tuning, Batch Normalization and Programming Frameworks
  • Structuring Machine Learning Projects
    • Introduction to ML Strategy
    • Setting Up Your Goal
    • Comparing to Human-Level Performance
    • Error Analysis
    • Mismatched Training and Dev/Test Set
    • Learning from Multiple Tasks
    • End-to-End Deep Learning
  • Convolutional Neural Networks
    • Foundations of Convolutional Neural Networks
    • Deep Convolutional Models: Case Studies
      • Classic Networks
      • ResNets
      • Inception
    • Advice for Using CNNs
    • Object Detection
      • Object Localization
      • Landmark Detection
      • Sliding Window Detection
      • The YOLO Algorithm
      • Intersection over Union
      • Non-Max Suppression
      • Anchor Boxes
      • Region Proposals
    • Face Recognition
      • One-Shot Learning
      • Siamese Network
      • Face Recognition as Binary Classification
    • Neural Style Transfer
  • Sequence Models
    • Recurrent Neural Networks
      • RNN Structure
      • Types of RNNs
      • Language Modeling
      • Vanishing Gradient Problem in RNNs
      • Gated Recurrent Units (GRUs)
      • Long Short-Term Memory Network (LSTM)
      • Bidirectional RNNs
    • Natural Language Processing & Word Embeddings
      • Introduction to Word Embeddings
      • Learning Word Embeddings: Word2Vec and GloVe
      • Applications using Word Embeddings
      • De-Biasing Word Embeddings
    • Sequence Models & Attention Mechanisms
      • Sequence to Sequence Architectures
        • Basic Models
        • Beam Search
        • Bleu Score
        • Attention Model
      • Speech Recognition
Powered by GitBook
On this page

Was this helpful?

  1. Convolutional Neural Networks
  2. Object Detection

Sliding Window Detection

PreviousLandmark DetectionNextThe YOLO Algorithm

Last updated 4 years ago

Was this helpful?

Once a CNN is trained using cropped images that contain just the target object (minimal background), say a car, it will be able to detect a car in a test image with background by sliding over the image and looking for a car in parts of the image (each part is passed as an input to the CNN which classifies it as car/not car).

This is, however, computationally inefficient.

A more efficient convolutional implementation of sliding windows is discussed below.

Convolutional Implementation of Sliding Windows

Consider the CNN shown below:

We can convert the FC layers (and softmax layer) to convolutional layers as follows:Say we have a 28x28 image. If we used the traditional sliding window approach with a 14x14 window and a stride of 2, we would pass the window 8 times from left to right for each row.

If we use the following convolutional approach, the same result can be obtained in a single forward pass, and the results obtained by each slide of the window in the traditional approach would be the same as that computed in the output of the CNN i.e. each of the 8 labels obtained per row in the traditional approach would match the first row of the 8x8x4 output of the CNN.

This approach is significantly more efficient.