> For the complete documentation index, see [llms.txt](https://vikram-bajaj.gitbook.io/cs-gy-6643-computer-vision-and-scene-analysis/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://vikram-bajaj.gitbook.io/cs-gy-6643-computer-vision-and-scene-analysis/edge-detection.md).

# Edge Detection

An **edge** is a set of pixels where the intensity changes abruptly.

Edge detectors are used to locate such changes in the intensity function.

Changes in a function are defined by their derivatives. Images are functions of two parameters (x,y). Therefore, operators describing edges are expressed using partial derivatives.

An edge has **magnitude (edge strength)** and **direction**. The magnitude is equal to that of the gradient and the direction is *perpendicular* to that of the gradient. Note that the gradient points in the direction of the most rapid change in the intensity.

The gradient is given by $$\triangledown f = \[\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}]$$

The magnitude of the gradient (and thereby of the edge) is $$||\triangledown f|| = \sqrt{(\frac{\partial f}{\partial x})^2+(\frac{\partial f}{\partial y})^2}$$ and the direction of the gradient is $$\theta = tan^{-1}(\frac{\partial f}{\partial y}/\frac{\partial f}{\partial x})$$.

Edges are usually used to find region **boundaries**. If a region has homogeneous brightness, its boundary/border will be the pixels at which the image function (brightness) abruptly changes.

The following denotes common **edge profiles**:![](/files/-M5-0UiVJfCYd8bML-lc)Edge detectors are tuned to detect a specific type of edge profile.

## Criteria for Optimal Edge Detection

* **Good Detection**: Minimize false positives and false negatives
* **Good Localization**: Detected edges must be as close as possible to the actual edges
* **Single Response**: Minimize the number of local maxima around the true edge

## Commonly Used Edge Detection Operators

Gradient operators (edge detectors) can be approximated using convolutions. Operators that detect edge direction are represented by a collection of masks (one for each direction), while those that don't detect direction are represented by a single mask.

A few edge detection operators are discussed below:

### Sobel Operator

$$M\_x = \begin{bmatrix}-1 & 0 & 1\\-2 & 0 & 2\\-1 & 0 & 1\end{bmatrix}, M\_y = \begin{bmatrix}-1 & -2 & -1\0 & 0 & 0\1 & 2 & 1\end{bmatrix}$$

$$M\_x, M\_y$$ are used to calculate the partial derivatives $$\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}$$ respectively. They represent the vertical and horizontal detectors respectively.

The sobel operator is used to detect horizontality and verticality of edges.

### Prewitt Operator

$$M\_x = \begin{bmatrix}-1 & 0 & 1\\-1 & 0 & 1\\-1 & 0 & 1\end{bmatrix}, M\_y = \begin{bmatrix}-1 & -1 & -1\0 & 0 & 0\1 & 1 & 1\end{bmatrix}$$

### Roberts Operator

$$M\_x = \begin{bmatrix}0 & 1\\-1 & 0\end{bmatrix}, M\_y = \begin{bmatrix}1 & 0\0 & -1\end{bmatrix}$$

**Disadvantage**: It is highly sensitive to noise.

### Kirsch Operator

$$M\_x = \begin{bmatrix}-5 & 3 & 3\\-5 & 0 & 3\\-5 & 3 & 3\end{bmatrix}, M\_y = \begin{bmatrix}-5 & -5 & -5\3 & 0 & 3\3 & 3 & 3\end{bmatrix}$$

### Laplace Operator ($$\triangledown^2$$)

While all the previously mentioned operators compute the first derivative, the Laplace operator is used to approximate the second derivative. Another important point to note is that this operator only gives the magnitude of the edge, not the direction. It is rotationally invariant.

The following are the masks for 4-neighborhood and 8-neighborhood:

$$M\_4=\begin{bmatrix}0 & 1 & 0\1 & -4 & 1\0 & 1 & 0\end{bmatrix}, M\_8=\begin{bmatrix}1 & 1 & 1\1 & -8 & 1\1 & 1 & 1\end{bmatrix}$$

**Disadvantage**: It responds doubly to some edges in the image.

## Zero-Crossings of the Second Derivative

The issue with the previously discussed operators is that they depend on the size of the object and are sensitive to noise.

The **Marr-Hildreth operator** uses the concept of *zero-crossings of the second derivative* to detect edges.

We know that a step edge corresponds to a sudden change in the image function. The first derivative of the image function should have an extremum at the position corresponding to the edge in the image. This implies that the second derivative at that position will be zero.

Since it is much easier to detect 0s when compared to finding extremums, detecting edges using zero-crossings is more convenient.

![](/files/-M5-0UiZYW2XcVYBoq85)

The image shows a step edge with its first and second derivatives. Clearly, the first derivative depicts an extremum at the edge location while the second derivative becomes 0 at that location.

Before computing the derivatives, we must smooth the image (to reduce noise). To do so, we use Gaussian smoothing. Now, we know that the Laplace operator computes the second derivative. So, all we need to do is to compute the **Laplacian of the Gaussian (LoG)**.

This is denoted by $$\triangledown^2 G$$ and is approximated by convolving an image with the following mask:

$$\begin{bmatrix}0 & 0 & -1 & 0 & 0\0 & -1 & -2 & -1 & 0\\-1 & -2 & -16 & -2 & -1\0 & -1 & -2 & -1 & 0\0 & 0 & -1 & 0 & 0\end{bmatrix}$$

Due to its shape, the inverted LoG operator is commonly called a **Mexican Hat**.

After convolving an image with the above operator, the locations in the convolved image where zero is crossed correspond to the edges in the input image.

The advantage of using this approach instead of the previously discussed (smaller) operators is that a larger area surrounding the pixel is taken into account, and the influence of distant pixels is reduced according to the $$\sigma$$ of the Gaussian.

```
Sidenote: The LoG operator can be approximated by a convolution with a mask that is obtained by
subtracting two Gaussian masks with significantly different sigma values. This method is called
Difference of Gaussians (DoG).
```

**Disadvantages of LoG operation**:

* it may smooth the shape too much (sharp corners may be lost)
* it tends to create closed loops of edges

## Pattern Matching

This is an approach where we use a "matched filter" containing the shape of the object that has to be detected in the image. We then slide the filter over the image and correlate the image with the "matched filter" to detect objects.

This doesn't generalize well; we need a separate filter for each object. It is also not invariant to scale (size; different sizes of same object in the image) and orientation (rotation) of the object.

## Canny Edge Detection

The Canny edge detector is an optimal edge detector (good detection, good localization, single response). It is especially good for step edges corrupted by white noise.

**Canny Edge Detection Steps**:

1. Use a **Gaussian filter** to reduce noise (smoothing)
2. Find the **intensity gradient** of the image; this is done using the Sobel operator (or any other operator such as Prewitt, Roberts etc). The operator will give the first derivative in the horizontal and vertical directions. These can be used to determine the edge gradient (magnitude) and direction. The gradient direction is then rounded to one of the 4 angles representing horizontal, vertical and two diagonal directions
3. **Non-Maximum Suppression** is performed: the image is scanned to remove any pixels that may not constitute an edge; each pixel is checked to determine if it is a local maximum in its neighborhood. If not, it is suppressed.
4. **Hysteresis Thresholding** is performed: this stage decides which pixels actually form edges. Two thresholds (low and high) are used. Pixels with intensities greater than the high threshold are said to be strong edge pixels and are retained. Pixels with intensities lower than the low threshold are discarded. Pixels with intensities between the low and high thresholds are retained only if they are connected to strong edge pixels. The final result is strong edges in the image.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://vikram-bajaj.gitbook.io/cs-gy-6643-computer-vision-and-scene-analysis/edge-detection.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
