The Image, its Representations and Properties
Last updated
Last updated
Computer Vision is an interdisciplinary field that deals with making computers gain a higher-level understanding from images and video.
A signal is a function depending on a variable that has a physical meaning.
Functions can be:
continuous: domain and range are continuous
discrete: domain is discrete
digital: domain and range are discrete
An image can be represented by a continuous function f(x,y) where (x,y) denote coordinates in a plane. (We deal with monochromatic static images i.e. time is constant).
The values of the function correspond to brightness/intensity values at the image point i.e. f(x,y)=intensity at (x,y). The image is therefore also called an intensity image.
The range of the intensity values is limited. In a monochromatic image, the lowest value corresponds to black and the highest value corresponds to white. Intensity values bounded by these limits are called gray-levels.
A 2D intensity image is formed by the perspective projection of a 3D scene. This can be modeled by a pinhole camera.
The above figure denotes a pinhole camera with the pinhole at the origin.
(x,y,z) are the coordinates of a point in 3D.
f is the focal length; it is the distance between the pinhole and the image plane.
(u,v) are the coordinates of the point projected onto the 2D image plane.
,
Note that a 3D scene cannot be reconstructed from a single 2D image because the 2D image lacks depth information.
The quality of a digital image is proportional to its:
spatial resolution: proximity of image samples in the image plane
spectral resolution: bandwidth of frequencies captures by the sensor
radiometric resolution: number of distinguishable gray-levels
time resolution: intervals between time samples at which images are captured
As discussed earlier, an image captured by a sensor is expressed as a continuous function f(x,y). However, to be processed by a computer, the image must be represented using an appropriate discrete data structure, such as a matrix.
Image digitization is the process of converting the continuous function into a digital one. It involves sampling and quantization.
First, the continuous function f(x,y) is sampled into a matrix with M rows and N columns.
The continuous image function is digitized at sampling points.
While sampling, it is important to choose:
the sampling period i.e. the distance between two neighboring sampling points in the image
the geometric arrangement of the sampling points i.e. the sampling grid
The sampling grid can be square (every pixel has 4 or 8 equidistant neighbors depending on or distance) or hexagonal (every pixel always has 6 equidistant neighbors):
Every sampling point in the grid corresponds to a pixel/image element (or voxel/volume element for a 3D image). A pixel is the smallest unit in an image and cannot be further divided.
A raster is a grid on which neighborhood relation between points is defined.
The transition between the continuous values of the image function and its digital equivalent is called quantization.
Once sampling is done, image quantization is performed to assign an integer value to each continuous sample i.e. the continuous range of the function f(x,y) is split into K intervals.
The number of quantization levels K should be high enough to permit human perception of fine shading details in the image. False contours may occur if the number of quantization levels is lower than that which humans can easily distinguish.
The figure below demonstrates an image before and after digitization:
Sampling digitizes the x values while quantization digitizes the y values.
The finer the sampling (i.e. larger M and N) and quantization (i.e. larger K), the better is the approximation of the continuous function.
Digital images have several properties (metric and topological (not covered) properties) that are different from those of continuous functions.
Pixels have a finite size and are assumed to be arranged in a rectangular grid. Every pixel contains information about the brightness/intensity at that point in the image. This allows us to represent the image by a 2D matrix whose elements are natural numbers corresponding to quantization levels of the brightness scale.
It refers to the distance between pixels in an image.
Distance between two points p and q i.e. D(p,q) must satisfy 3 properties:
identity: D(p,q)>=0 and D(p,q)=0 iff p=q
symmetry: D(p,q)=D(q,p)
triangular inequality: D(p,q)<=D(p,r)+D(r,q)
The distance between two points (i,j) and (h,k) can be computed in several ways:
Euclidean Distance This is computationally costly due to the square root.
D4 Distance (City-Block/Manhattan Distance or L1 Metric) It is the minimum number of steps in the grid from the starting point to the ending point allowing only horizontal and vertical moves.
D8 Distance (Chess-Board Distance) It is the minimum number of steps in the grid from the starting point to the ending point allowing horizontal, vertical and diagonal moves.
Pixel adjacency is another important metric.
Two pixels p, q are said to be 4-neighbors if and 8-neighbors if .
A path is a set of indices such that consecutive indices are adjacent. A path can be:
non-cyclic: all indices are unique
cyclic: non-cyclic but first and last indices are adjacent
If there is a path between two pixels in the image, the pixels are said to be contiguous.
A region (or connected component) is a set of pixels in which every pair of pixels is contiguous.
A region with no holes is called simple contiguous and a region with holes is called multiple contiguous.
Regions can be used to identify objects and segment them from the background.
Distance Transform (also called Distance Function/Chamfering Algorithm) is a simple application of the concept of distance.
It gives the distance of pixels from an image subset.
The resulting image has 0s for the pixels contained by the subset, low values for nearby pixels and higher values for pixels further away from the subset.
We get a different result based on the type of distance ( Euclidean).
This image denotes distance transform with distance.
and this image denotes the distance transform with distance. (Here, the central pixel is taken as the image subset w.r.t. which the distances are computed).
(I have skipped the Distance Transform algorithm. It basically involves traversing the image using two local masks: one sweeping over the image from the top-left corner from left-to-right in a top-down manner and the other sweeping over the image from the bottom-right corner from right-to-left in a bottom-up manner).
Color Images are mappings to some subset of R3. Ex. RGB, HSV etc.
Spectral Imagery involves the measuring of energy at different bands within the electromagnetic spectrum. Ex. satellite images
This is an additive color system. It has the primary colors.
R: Red, G: Green, B: Blue
R+G+B = W (White)
R+G = Y (Yellow)
R+B = M (Magenta)
B+G = C (Cyan)
The RGB color scheme is used in color monitors/displays.
This is a subtractive color system. It has the secondary colors.
C: Cyan, M: Magenta, Y: Yellow, K: Black
C = W-R = (G+B) absorbs R, reflects GB
M = W-G = (R+B) absorbs G, reflects RB
Y = W-B = (R+G) absorbs B, reflects RG
The CMYK color scheme is used for printing/painting.
Image addition helps reduce noise.
Image subtraction is used for motion detection.
Image multiplication can be used for ROI masking.
Image division is used for fixing images with irregular illumination.
The following is another example of distance transform with distance:Here, the 1s constitute the image subset.