Computer vision

Feb 27, 2023·
Junhong Liu
Junhong Liu
· 29 min read

目录 / Contents

1. Edge Detection

1.1. Introduction of Intensity Images

  • An intensity image is a data matrix, whose values represent intensities within some range.

  • Intensity image is represented as a single matrix, with each element of the matrix corresponding to one image pixel.

  • The elements in the intensity matrix represent various intensities (or gray levels), where the intensity 0 usually represents 0 intensity (or black) and the intensity 1, 255, or 65535 usually represents full intensity (or white).

In matlab: To display an intensity image, use the imagesc (“image scale”) function.

1.2. Indexed Images

  • An indexed image consists of a data matrix $X$, and a colormap matrix $map$.

  • $map$ is an $m \times 3$ array of class double containing floatingpoint values in the range $[0, 1]$.

  • Each row of $map$ specifies the red, green, and blue components of a single color.

1.3. Intensity Gradients

  • The image is a function mapping coordinates to intensity $f(x,y)$

  • The gradient of the intensity is a vector, which has an $x$ and a $y$ component. $$ \vec{G}[f(x,y)] = \left[ \begin{matrix} G_x \\ G_y \end{matrix} \right] = \left[ \begin{matrix} \frac{df}{dx} \\ \frac{df}{dy} \end{matrix} \right] $$

  • Magnitude - $M(\vec{G}) = \sqrt{G^2_x + G^2_y}$

  • Direction - $\alpha(x,y) = \mathbf{arctan}(\frac{G_y}{G_x})$

1.4. Approximating the Gradient

In image, it is diffcult to calculate the real gradient, but we can easily approximate the gradient.

For example: $$ G_x \cong f(i,j+1) - f(i,j) = \left[ \begin{matrix} -1 & 1 \end{matrix} \right] $$ $$ G_y \cong f(i,j) - f(i+1,j) = \left[ \begin{matrix} 1 \\ -1 \end{matrix} \right] $$

But this way calculates the gradient in different place. As shown below:

Gradient1

Thus, if We want to estimated in the same place, we can use a 2x2 mask instead. $$ G_x = \left[ \begin{matrix} -1 & 1 \\ -1 & 1 \end{matrix} \right] ,\ G_y = \left[ \begin{matrix} 1 & 1 \\ -1 & -1 \end{matrix} \right] $$

Gradient2

For each mask of weights, you multiply the corresponding pixel by the weight and sum over all pixels.

1.4.1. Other edge detectors

Roberts

$$ G_x = \left[ \begin{matrix} 1 & 0 \\ 0 & -1 \end{matrix} \right] ,\ G_y = \left[ \begin{matrix} 0 & -1 \\ 1 & 0 \end{matrix} \right] $$

Sobel

$$ G_x = \left[ \begin{matrix} -1 & 0 & 1 \\ -2 & 0 & 2 \\ -1 & 0 & 1 \end{matrix} \right] ,\ G_y = \left[ \begin{matrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ -1 & -2 & -1 \end{matrix} \right] $$

Gradient3

1.5. Convolution

Convolution is the computation of weighted sums of image pixels. For each pixel $p[i,j]$ in the image, the value $h[i,j]$ is calculated by translating the mask to pixel $p[i,j]$ and taking the weighted sum of pixels in neighbourhood.

Convolution

1.6. Edge Detection

Steps:

  • Take image
  • Convolve mask with image for each direction ($G_x$ and $G_y$)
  • Calculate magnitude $M(\vec{G}) = \sqrt{G^2_x + G^2_y}$. However, the cost of square root ($\sqrt{x}$) is expensive. Thus, it is normally replaced by $|G_x| + |G_y|$.

Note: We could detect edges by calculating the intensity change (gradient) across the image, and implement this using the idea of filtering.

Original
Gx
Gy
M

2. Noise Filtering

We can use convolution to remove noise. Note: we should use noise filters before edge detectors.

2.1. Mean filter

One of filters is mean filter

$$ \left[ \begin{matrix} \frac{1}{n} & \frac{1}{n} & \frac{1}{n} \\ \frac{1}{n} & \frac{1}{n} & \frac{1}{n} \\ \frac{1}{n} & \frac{1}{n} & \frac{1}{n} \end{matrix} \right] $$

Where, $n$ is the size of the filter. At here, the size of the filter is $3 \times 3$, so $n=9$. Obviously, mean filter is a linear filter.

% load data
shakey = read_image('','shakey.150.gif');
show_image(shakey);
title("Original")
load sobel.mat % for details, see Lab 1 on this blog

% Do not use nosie Noise filter, threshold is 30, using horizontal Sobel operator

shakey_X = conv2(shakey,sobelX,'valid');
show_image(abs(shakey_X)>30);
title("Gx")

% when size is 3 by 3, threshold is 30, using horizontal Sobel operator
sizeFilter = 3;
meanFilter = ones(sizeFilter)./(sizeFilter^2);

shakey_meanFilter = conv2(shakey,meanFilter,'valid');
show_image(shakey_meanFilter);
title("3by3")

shakey_X = conv2(shakey_meanFilter,sobelX,'valid');
show_image(abs(shakey_X)>30);
title("Gx 3by3")

% when size is 5 by 5, threshold is 30, using horizontal Sobel operator
sizeFilter = 5;
meanFilter = ones(sizeFilter)./(sizeFilter^2);

shakey_meanFilter = conv2(shakey,meanFilter,'valid');
show_image(shakey_meanFilter);
title("5by5")

shakey_X = conv2(shakey_meanFilter,sobelX,'valid');
show_image(abs(shakey_X)>30);
title("Gx 5by5")
Original
3by3
5by5
Abs_Gx
Gx_3by3
Gx_5by5

2.2. Gaussian filter

Gaussian filtering is The most widely used.

$$ \left[ \begin{matrix} 0 & 0.01 & 0.02 & 0.01 & 0 \\ 0.01 & 0.06 & 0.11 & 0.06 & 0.01 \\ 0.02 & 0.11 & 0.16 & 0.11 & 0.02 \\ 0.01 & 0.06 & 0.11 & 0.06 & 0.01 \\ 0 & 0.01 & 0.02 & 0.01 & 0 \end{matrix} \right] $$

In addition, We can replace a 2D Gaussian filter with two 1D Gaussian filters in sequence.

For example:
$$ \left[ \begin{matrix} 0.003 & 0.133 & 0.0219 & 0.133 & 0.003 \\ 0.133 & 0.0596 & 0.0983 & 0.0596 & 0.133 \\ 0.0219 & 0.0983 & 0.1621 & 0.0983 & 0.0219 \\ 0.133 & 0.0596 & 0.0983 & 0.0596 & 0.133 \\ 0.003 & 0.133 & 0.0219 & 0.133 & 0.003 \end{matrix} \right] $$

whcih can be decomposed into:
$$ \left[ \begin{matrix} 0.0545 & 0.2442 & 0.4026 & 0.2442 & 0.0545 \end{matrix} \right] \times \left[ \begin{matrix} 0.0545 \\ 0.2442 \\ 0.4026 \\ 0.2442 \\ 0.0545 \end{matrix} \right] $$

3. Advanced Edge Detection

3.1. Laplacian Operator (Second order operators)

In two dimensions, the definition of Laplacian Operator is: $$ \nabla^2 f = \frac{\partial^2 f}{\partial x^2} + \frac{\partial^2 f}{\partial y^2} $$

As mentioned before, We approximate the derivative of $f$ in the $x$ direction as: $$ G_x = \frac{\partial f}{\partial x} = f(i,j+1) - f(i,j) $$

where, $f(i,j)$ corresponds to the coordinates of the pixel.

Thus, we have: $$ \begin{aligned} \frac{\partial^2 f}{\partial x^2} &= \frac{\partial G_x}{\partial x} \\ &= \frac{\partial [f(i,j+1) - f(i,j)]}{\partial x} \\ &= \frac{\partial f(i,j+1)}{\partial x} - \frac{\partial f(i,j)}{\partial x} \\ &= [f(i,j+2) - f(i,j+1)] - [f(i,j+1) - f(i,j)] \\ &= f(i,j+2) - 2*f(i,j+1) + f(i,j) \end{aligned} $$

Similarly, we can get $$ \frac{\partial^2 f}{\partial y^2} = f(i+2,j) - 2*f(i+1,j) + f(i,j) $$

Then, we get the Laplacian Operator as: $$ \nabla ^2 = \left[ \begin{matrix} 2 & -2 & 1 \\ -2 & 0 & 0 \\ 1 & 0 & 0 \end{matrix} \right] $$

However, in practice, the Laplacian operator is normally approximate as: $$ \nabla ^2 \approx \left[ \begin{matrix} 0 & -1 & 0 \\ -1 & 4 & -1 \\ 0 & -1 & 0 \end{matrix} \right] $$

Another commonly used small kernels are $$ \nabla ^2 \approx \left[ \begin{matrix} -1 & -1 & -1 \\ -1 & 8 & -1 \\ -1 & -1 & -1 \end{matrix} \right] $$

3.2. Laplacian of Gaussian (LoG)

We know the Laplacian Operator is: $$ L(x,y) = \frac{\partial^2 f}{\partial x^2} + \frac{\partial^2 f}{\partial y^2} $$

The 2D Gaussian Distribution (mean = 0) function is: $$ G_\sigma(x,y) = \frac{1}{2 \pi \sigma^2} \cdot e^{-\frac{x^2+y^2}{2 \sigma^2}} $$

PS: The 1D Gaussian Distribution (mean = 0) function is: $$ G_\sigma(x) = \frac{1}{\sqrt{2 \pi} \sigma} \cdot e^{-\frac{x^2}{2 \sigma^2}} $$

When mean is 0, standard deviation is $\sigma$, we have: $$ LoG(x,y) = -\frac{1}{\pi \sigma^4} \cdot (1 - \frac{x^2+y^2}{2 \sigma^2}) \cdot e^{-\frac{x^2+y^2}{2 \sigma^2}} $$


Formula Derivation:

First, we know $$ LoG(x,y) = \frac{\partial^2 G_\sigma(x,y)}{\partial x^2} + \frac{\partial^2 G_\sigma(x,y)}{\partial y^2} $$

then, we solve $\frac{\partial^2 G_\sigma(x,y)}{\partial x^2}$ and $\frac{\partial^2 G_\sigma(x,y)}{\partial y^2}$ separately.

$$ \frac{\partial}{\partial x}G_\sigma(x,y) = \frac{1}{2 \pi \sigma^2} \cdot \frac{\partial}{\partial x}e^{-\frac{x^2+y^2}{2 \sigma^2}} = - \frac{1}{2 \pi \sigma^2} \cdot \frac{x}{\sigma^2} \cdot e^{-\frac{x^2+y^2}{2 \sigma^2}} $$

and

$$ \begin{aligned} \frac{\partial^2}{\partial x^2}G_\sigma(x,y) &= \frac{1}{2 \pi \sigma^2} \cdot \frac{\partial}{\partial x}(- \frac{x}{\sigma^2} \cdot e^{-\frac{x^2+y^2}{2 \sigma^2}}) \\ &= \frac{1}{2 \pi \sigma^2} (-\frac{1}{\sigma^2} \cdot e^{-\frac{x^2+y^2}{2 \sigma^2}} + \frac{x^2}{\sigma^4} \cdot e^{-\frac{x^2+y^2}{2 \sigma^2}}) \\ &= \frac{1}{2 \pi \sigma^2} \cdot \frac{x^2 - \sigma^2}{\sigma^4} \cdot e^{-\frac{x^2+y^2}{2 \sigma^2}} \end{aligned} $$

Similarly, we can get $$ \frac{\partial^2}{\partial y^2}G_\sigma(x,y) = \frac{1}{2 \pi \sigma^2} \cdot \frac{y^2 - \sigma^2}{\sigma^4} \cdot e^{-\frac{x^2+y^2}{2 \sigma^2}} $$

Thus, we get $$ \begin{aligned} LoG(x,y) &= \frac{\partial^2 G_\sigma(x,y)}{\partial x^2} + \frac{\partial^2 G_\sigma(x,y)}{\partial y^2} \\ &= \frac{1}{2 \pi \sigma^2} \cdot \frac{x^2 + y^2 -2 \sigma^2}{\sigma^4} \cdot e^{-\frac{x^2+y^2}{2 \sigma^2}} \\ &= -\frac{1}{\pi \sigma^4} \cdot (1 - \frac{x^2+y^2}{2 \sigma^2}) \cdot e^{-\frac{x^2+y^2}{2 \sigma^2}} \end{aligned} $$


For example:

% load data
shakey = read_image('','shakey.150.gif');

% creat filter
size = 9;
valueRange = -4:1:4;
s = 1.4; % standard deviation

LoG_Filter = zeros(size);

% calculate the value of LoG filter
for i = 1:size
    x = valueRange(i);
    LoG_Filter(i,:) = LoG(s,x,valueRange);
end

% apply LoG filter
shakeyLoG = conv2(shakey,LoG_Filter,'valid');

figure();
show_image(shakeyLoG);
title("shakeyLoG")

%% LoGfunction

function VLoG = LoG(s,x,y)

VLoG = -1/(s^4*pi).*(1 - (x.^2+y.^2)./(s^2*2)) .* exp(-(x.^2+y.^2)./(s^2*2));

end

Output:

LoG

3.3. Canny Edge Detector

Canny has shown that the first derivative of the Gaussian closely approximates the operator that optimizes the product of signal-to-noise ratio and localization. (analysis based on “step-edges” corrupted by “Gaussian noise”)

Steps of Algorithm:

  1. Compute $f_x$ and $f_y$:
$$ f_x = \frac{\partial}{\partial x}(f*G) = f * \frac{\partial}{\partial x}G = f * G_x $$ $$ f_y = \frac{\partial}{\partial y}(f*G) = f * \frac{\partial}{\partial y}G = f * G_y $$

where $G(x,y)$ is the Gaussian function, $G_x(x,y)$ is the derivate of $G(x,y)$ with respect to $x$: $$ \begin{aligned} G_x(x,y) &= \frac{\partial}{\partial x} (\frac{1}{2 \pi \sigma^2} \cdot e^{-\frac{x^2+y^2}{2 \sigma^2}}) \\ &= \frac{1}{2 \pi \sigma^2} \cdot \frac{\partial}{\partial x}e^{-\frac{x^2+y^2}{2 \sigma^2}} \\ &= - \frac{1}{2 \pi \sigma^2} \cdot \frac{x}{\sigma^2} \cdot e^{-\frac{x^2+y^2}{2 \sigma^2}} \\ &= - \frac{x}{\sigma^2} \cdot G(x,y) \end{aligned} $$

in the same way, $G_y(x,y)$ is the derivate of $G(x,y)$ with respect to $y$: $$ G_y(x,y) = - \frac{y}{\sigma^2} \cdot G(x,y) $$

$\sigma$ is the standard deviation of $G(x,y)$.

  1. Compute the gradient magnitude and direction: $$ magn(x,y) = |f_x| + |f_y| \\ dir(x,y) = \arctan(f_y/f_x) $$

  2. Apply non-maxima suppression.

  3. Apply hysteresis thresholding / edge linking.

4. Hough Transform

4.1. Hough Transform Basics

Assuming there is a point $p(x,y)$ in Cartesian Space, so Hough Transform provides a method to map $p(x,y)$ from Cartesian Space to Polar Space.

In Polar Space, we use $w$ and $\theta$ to define points, where $w$ is the distance and $\theta$ is angle from the origin (or reference point). Thus, we map $p(x,y)$ in Cartesian Space to $p(w,\theta)$ in Polar Space. $$ w = \sqrt{x^2 + y^2} \\ \theta = \arctan(y/x) $$

However, there are infinite lines can go through $p(x,y)$. How to define these lines?

Setting a reference point $p_o$, a point $p(x,y)$ located on a line $y = kx + b$. We use $w$ to express the distance between $p_o$ and this line, $\theta$ to express the angle between axis and the normal of this line.

HoughTransform1

Then, we use $-\frac{\cos\theta}{\sin\theta}$ to express $k$, $\frac{w}{\sin\theta}$ to express $b$. So that we have $$ \begin{aligned} y &= -\frac{\cos\theta}{\sin\theta} \cdot x + \frac{w}{\sin\theta} \\ y \sin\theta &= - x \cos\theta + w \\ w &= x \cos\theta + y \sin\theta \end{aligned} $$

Now, For a given point $p_1(x,y)$, we can calculate all lines going through it by function $w = x \cos\theta + y \sin\theta$.

4.2. Application

Normally, we use Hough Transform in Feature Extraction (特征提取). Hough Transform is used to identify features in objects, such as lines.

  1. Each straight line in the image space corresponds to a single point in the Polar space;
  2. Any part of the line segment on the straight line in the image space corresponds to the same point in the Polar space.

Thus, the line detection task is accomplished by finding peaks in the Polar space.

HoughTransform2


5. Panorama Generation

A panorama is formed by stitching together multiple images into one seamless image. A guide on this process can be found here.

Please Note: the estgeotform2d function require Matlab 2022 version.

5.1. Part 1

Three images of the Aston Webb building have been provided.

1
2
3

The following steps need to be taken in order to create the panorama.

  1. Use any preprocessing you like to manipulate the given images. (Here we do not manipulate the given images.)
% Load images
im1 = imread("1.jpg");
im2 = imread("2.jpg");
im3 = imread("3.jpg");
  1. Detect and match features between image $I(n)$ and $I(n - 1)$.
  2. Estimate the geometric transformation $T(n)$, that maps $I(n)$ to $I(n - 1)$.
  3. Compute the transformation that maps $I(n)$ into the panorama image as $T(1) * T(2) * ... * T(n - 1) * T(n)$.
%% Register Image Pairs

% Initialize features for im1. Here we convert the original image to a grayscale image
grayImage = im2gray(im1);

points = detectSURFFeatures(grayImage);
[features, points] = extractFeatures(grayImage,points);

% Initialize all the transformations to the identity matrix. Note that the
% projective transformation is used here because the building images are fairly
% close to the camera. For scenes captured from a further distance, you can use
% affine transformations.
numImages = 3;
tforms(numImages) = projtform2d;

% Initialize variable to hold image sizes.
imageSize = zeros(numImages,2);

% Iterate over remaining image pairs
for n = 2:numImages
    % Store points and features for I(n-1).
    pointsPrevious = points;
    featuresPrevious = features;

    % Read I(n).
    if n == 2
        I = im2;
    else
        I = im3;
    end

    % Convert image to grayscale.
    grayImage = im2gray(I);

    % Save image size.
    imageSize(n,:) = size(grayImage);

    % Detect and extract SURF features for I(n).
    points = detectSURFFeatures(grayImage);
    [features, points] = extractFeatures(grayImage, points);

    % Find correspondences between I(n) and I(n-1).
    indexPairs = matchFeatures(features, featuresPrevious, 'Unique', true);

    matchedPoints = points(indexPairs(:,1), :);
    matchedPointsPrev = pointsPrevious(indexPairs(:,2), :);

    % Estimate the transformation between I(n) and I(n-1).
    tforms(n) = estgeotform2d(matchedPoints, matchedPointsPrev,"projective" ...
        , 'Confidence', 99, 'MaxNumTrials', 2000);
    %     tforms(n) = estgeotform2d(matchedPoints, matchedPointsPrev,...
    %         'projective', 'Confidence', 99.9, 'MaxNumTrials', 2000);

    % Compute T(1) * T(2) * ... * T(n-1) * T(n).
    tforms(n).A = double(tforms(n-1).A) * double(tforms(n).A);
end

% Compute the output limits for each transformation.
for i = 1:numel(tforms)
    [xlim(i,:), ylim(i,:)] = outputLimits(tforms(i), [1 imageSize(i,2)], [1 imageSize(i,1)]);
end

avgXLim = mean(xlim, 2);
[~,idx] = sort(avgXLim);
centerIdx = floor((numel(tforms)+1)/2);
centerImageIdx = idx(centerIdx);

Tinv = invert(tforms(centerImageIdx));
for i = 1:numel(tforms)
    tforms(i).A = Tinv.A * tforms(i).A;
end
  1. Utilise the imwarp function to align the images into the panorama.
%% Initialize the Panorama
for i = 1:numel(tforms)
    [xlim(i,:), ylim(i,:)] = outputLimits(tforms(i), [1 imageSize(i,2)], [1 imageSize(i,1)]);
end

maxImageSize = max(imageSize);

% Find the minimum and maximum output limits.
xMin = min([1; xlim(:)]);
xMax = max([maxImageSize(2); xlim(:)]);

yMin = min([1; ylim(:)]);
yMax = max([maxImageSize(1); ylim(:)]);

% Width and height of panorama.
width  = round(xMax - xMin);
height = round(yMax - yMin);

% Initialize the "empty" panorama.
panorama = zeros([height width 3], 'like', I);

%% Create the Panorama

blender = vision.AlphaBlender('Operation', 'Binary mask', ...
    'MaskSource', 'Input port');

% Create a 2-D spatial reference object defining the size of the panorama.
xLimits = [xMin xMax];
yLimits = [yMin yMax];
panoramaView = imref2d([height width], xLimits, yLimits);

% Create the panorama.
for i = 1:numImages

    if i == 1
        I = im1;
    elseif i == 2
        I = im2;
    else
        I = im3;
    end

    %     I = readimage(buildingScene, i);

    % Transform I into the panorama.
    warpedImage = imwarp(I, tforms(i), 'OutputView', panoramaView);
    figure();
    imshow(warpedImage);
    imwrite(warpedImage,['transform' , num2str(i), '.png'])
    % Generate a binary mask.
    mask = imwarp(true(size(I,1),size(I,2)), tforms(i), 'OutputView', panoramaView);

    % Overlay the warpedImage onto the panorama.
    panorama = step(blender, panorama, warpedImage, mask);
end

figure
imshow(panorama)
imwrite(panorama,"result.png")

5.1.1. Results

  1. Figure showing undistorted input images.
Original Input
1
2
3
Transform images
transform1
transform2
transform3
  1. Figure showing complete panorama.

result

  1. A written explanation of the steps taken in the report, stating which functions you used, why you used them and a short explanation of how they work.

Step 1: Use any preprocessing you like to manipulate the given images.

  • In this step, I use build-in im2gray function to convert the RGB image to grayscale. im2gray function works by removing hue and saturation information while preserving lightness. I use this function to reduce the dimensions of image data, and reduce the cost of calculation.

Step 2: Detect and match features between image $I(n)$ and $I(n - 1)$.

  • In this step, I use build-in detectSURFFeatures function to detect features. detectSURFFeatures function implements the Speeded-Up Robust Features (SURF) algorithm to find blob features. I use this function because it is more efficient compared to other algorithms. I tried other algorithms, such as “Minimum Eigenvalue”, “KAZE”, “FAST” and so on, their results are not ideal. Meanwhile, I need to detect features in images to be ready for matching features.
  • Also, I use build-in matchFeatures function to match features between image $I(n)$ and $I(n - 1)$. matchFeatures function helps to find corresponding interest points between a pair of images using local neighbhorhoods and the Harris algorithm. I use this function to match features between images so that I can get the transformation matrix.

Step 3: Estimate the geometric transformation, $T(n)$, that maps $I(n)$ to $I(n - 1)$.

  • In this step, I use build-in estgeotform2d function to estimate the geometric transformation. estgeotform2d function estimates a 2-D geometric transformation between two images by mapping the inliers in the matched points from one image to the inliers in the matched points from another image. At here, I set the transformation type is projective.

Step 4: Compute the transformation that maps $I(n)$ into the panorama image as $T(1) * T(2) * ... * T(n - 1) * T(n)$. I do not use any function in this step.

Step 5: Utilise the imWarp function to align the images into the panorama.

  • In this step, I use build-in imwarp function returns the transformed image according to the geometric transformation. Then, I can overlay the transformed images onto the panorama.

5.2. Part 2

The panorama which has been produced is not a uniform shape. Write an algorithm from scratch that iteratively crops the image so that no black areas are included. Your algorithm should preserve as much of the non-black areas of the image as possible and work with any provided panorama.

  1. A figure showing the original panorama overlaid with (red) lines representing the cropped area

result_cutArea

  1. The cropped panorama

result_cut

5.2.1. Algorithm explanation

Step 1: Using build-in im2gray and imbinarize functions to convert the RGB image to binary image. See below figure for result.

biImage

Step 2: Using build-in imfill function to fill the holes in the binary image. See below figure for result.

biImage_fillholes

Step 3: Setting two scanlines $S_{up}$ and $S_{down}$ with a length of $l$, where $l$ is equal to the length of the original picture in the horizontal direction.

Step 4: Let the midpoint of the scanlines follow the central axis of the vertical direction of the panorama, and scan from the top and bottom respectively. Once the pixel values on the scanlines are all $1$, record the ordinate of the scanlines at this time as $b_{up}$ and $b_{down}$. The $b_{up}$ and $b_{down}$ are the upper and lower boundary coordinates of the clipping region.

Step 5: Now, we have know the upper and lower boundary coordinates. Assuming the coordinate of the center point of the panorama is $(c_x,c_y)$. Then, firing two rays from points $(c_x, b_{up})$ and $(c_x, b_{down})$ in the horizontal left and right directions. Once a value of 0 is detected, record the position $b_{left}$ and $b_{right}$ of the ray endpoint at this time as the left and right boundaries. (Note: The left and right boundaries are calculated independently. For example, when the position of the left boundary is confirmed, the search of the right boundary will not stop)

Step 6: Finally, we get the four vertices coordinates of the rectangular clipping region. They are $(b_{up}, b_{left})$, $(b_{up}, b_{right})$, $(b_{down}, b_{left})$, $(b_{down}, b_{right})$. Then we can crop picture.

5.2.2. Source code

%% Task 1.1.2
% function  cutBackground

% function ICut = cutBackground(I)

% figure
% imshow(ICut)
% imwrite(ICut,"result_cut.png")

panorama = imread("result.png");
I = panorama;

grayImage = im2gray(I);
grayImage = imbinarize(grayImage,0.01);
% figure
% imshow(grayImage);
imwrite(grayImage,"biImage.png")
grayImage = imfill(grayImage,"holes");
% figure
% imshow(grayImage);
imwrite(grayImage,"biImage_fillholes.png")


[row, column] = size(grayImage);
columnCenter = round(column/2);


row_im2 = 3024;
column_im2 = 4032;

% up and low
width_im2 = round(column_im2/2);
left = columnCenter - width_im2;
right = columnCenter + width_im2;

iPixel = 1;
signal = 1;
while signal
    up = iPixel;
    a = all(grayImage(up, left:right));

    if a
        signal = 0;
    else
        iPixel = iPixel + 1;
    end
end

iPixel = 0;
signal = 1;
while signal
    low = row - iPixel;
    b = all(grayImage(low, left:right));

    if b
        signal = 0;
    else
        iPixel = iPixel + 1;
    end
end



check = grayImage(up:low,:);

% right and left
iPixel = 1;
signal_right = 1;
while signal_right
    right = columnCenter + iPixel;

    if grayImage(up, right) == 0 || grayImage(low, right) == 0 || right == column
        signal_right = 0;
    else
        iPixel = iPixel + 1;
    end

end

iPixel = 1;
signal_left = 1;
while signal_left
    left = columnCenter - iPixel;

    if grayImage(up, left) == 0 || grayImage(low, left) == 0 || left == 1
        signal_left = 0;
    else
        iPixel = iPixel + 1;
    end

end

figure
line  = I;
line(up:up+10,left:right, 1) = 255;
line(up:up+10,left:right, 2) = 0;
line(up:up+10,left:right, 3) = 0;

line(low-10:low,left:right, 1) = 255;
line(low-10:low,left:right, 2) = 0;
line(low-10:low,left:right, 3) = 0;

line(up:low,left:left+10, 1) = 255;
line(up:low,left:left+10, 2) = 0;
line(up:low,left:left+10, 3) = 0;

line(up:low,right-10:right, 1) = 255;
line(up:low,right-10:right, 2) = 0;
line(up:low,right-10:right, 3) = 0;
imshow(line);
imwrite(line,"result_cutArea.png")


ICut = I(up:low,left:right,:);

figure
imshow(ICut)
imwrite(ICut,"result_cut.png")
% end

6. Robot Vision QA

words:
horizontal 水平的
vertical 垂直的
histogram 直方图
RANSAC
distance 距离
direction 方向
SIFT descriptor
inliers 内点
transformation
Detection
robust 健壮的
pixel 像素
Eigenfaces 特征脸
Smoothing
Enhancement
Thresholding
Localization
invariant 不变的
scale
second derivatives 二阶导数
dimension 维度
opponent process theory

  1. With regards to human vision

(a) Describe the role of ON/OFF cells in the perceived enhancement of contrast in human vision. [6]

Answers: Two types of Ganglion cells: “on-center” and “off-center”.

  • On-center: stimulated when the center of its receptive field is exposed to light, and is inhibited when the surround is exposed to light.
  • Off-center cells have just the opposite reaction.
  • ON/OFF cells allow ganglion cells to transmit information not merely about whether photoreceptor cells are firing (Photoreceptors do not actually fire action potentials), but also about the differences in firing rates of cells in the center and surround.

  • ON/OFF cells allow transmission of information about contrast.

  • The size of the receptive field governs the spatial frequency of the information.

(b) Discuss the evolution of Light Capturing Devices (photocells) to allow the progress of detection of light from 1D to 2D.

(c) Use schematics and diagrams to illustrate this evolution where appropriate.

  1. The Laplacian Operator is the two-dimensional equivalent of the second derivative used for edge detection. The formula for the Laplacian of a function $f(x,y)$ is $$ \nabla^2 f = \frac{\partial^2 f}{\partial x^2} + \frac{\partial^2 f}{\partial y^2} $$
$$ \nabla^2 \approx \left[ \begin{matrix} 0 & -1 & 0 \\ -1 & 4 & -1 \\ 0 & -1 & 0 \end{matrix} \right] $$$$ I = \left[ \begin{matrix} 8 & 9 & 9 \\ 9 & 10 & 11 \\ 10 & 11 & 12 \end{matrix} \right] $$$$ I' = \left[ \begin{matrix} 14 & 9 & 16 \\ 8 & 0 & 13 \\ 20 & 12 & 26 \end{matrix} \right] $$

(b) To determine edges, after applying the Laplacian Operator, we simply can not take the magnitude. what additional step is needed to find pixels that are edges?

Answer: Often, points that lie on an edge are detected by:

(1) Detecting the local maxima or minima of the first derivative.

(2) Detecting the zero-crossings of the second derivative.

(c) Why are 2nd order edge detectors, such as the Laplacian Operator shown above, better at detecting sharper edges in intensity images, as compared to 1st order based operators? You should provide a direct comparison between 1st order and 2nd order derivatives for full marks.

Answer: Local maxima of the first derivative correspond to zero crossings of the second derivative, which reduces the impact of non-local maxima of the first derivative on the results.

  1. With regards to human vision

(a) Describe the role and behaviour of the following in the visual system:

i. Rod Photoreceptors

Answer: Rod Photoreceptors (~120 m) are extremely sensitive photosensor; respond to a single photon. Used almost exclusively for night vision. Poor spatial resolution as they converge to same neuron within retina.

ii. Cone Photoreceptors

Answer: Cone Photoreceptors (~6 m) are primarily responsible for color recognition; Active at higher light levels; Higher resolution as Signal processed by several neurons.

(b) Objects selectively absorb some wavelengths (colours) and reflect others.

i. State two different forms of colour vision proposed that allows humans do differentiate colours.

Answer:

  1. Trichromatic (three-colour) theory

  2. Opponent process theory

ii. Discuss these two forms of colour vision in detail, stating the pros and cons of each form. You should outline the mechanism for each theory that allows colour determination.

Answer:

  1. Trichromatic (three-colour) theory. any colour can be produced by appropriate mixing of the three primary colours (Red, Green and Blue). It explains: How we discriminate wavelengths 2nm in difference; How we can match a mixture of wavelengths to a single colour; Some types of colour blindness. But it does not account for colour blending: Some colour blend while others don’t; One can imaging Bluish-green or Yellowish-green, But NOT Greenish-red or Bluish-yellow.

  2. Opponent process theory. colour may be represented in visual system as ‘opponent colours’ (Yellow, Blue, Red and Green). Neurons respond to pairs of primary colours (Red-Green & Yellow-Blue); Some respond in centre-surround fashion; Response characteristics determined by appropriate ganglion cells connections. Trichromatic theory cannot explain why yellow is a primary colour. Also some colours blend and others do not.

  3. For edge detection using intensity images, we often approximate the gradient of the intensity using Masks.

(a) What is the advantage of Sobel operator over Roberts operator when used for edge detection?

Answer: Roberts operator is sensitive to noise; Roberts is better at detecting sharper edges.

(b) Why are second order edge detectors better at finding edges than first order detectors?

Answer: Local maxima of the first derivative correspond to zero crossings of the second derivative, which reduces the impact of non-local maxima of the first derivative on the results.

(c) Describe how the computational speed of applying a 2D Gaussian filter to an image raster can be improved.

Answer: We can get a 1D Gaussian filter, and apply the 1D Gaussian filter to the image raster in the vertical direction first, then apply the 1D Gaussian filter to the image raster in the horizontal direction.

(d) Convolve the image Raster with the Mask shown below. You will only need to find the output corresponding to the sixteen highlighted central elements of the original image raster.

$$ \left[ \begin{matrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ -1 & -2 & -1 \end{matrix} \right] $$$$ \left[ \begin{matrix} 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 & 1 & 1 \\ 0 & 0 & 1 & 1 & 1 & 1 \\ 0 & 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 \end{matrix} \right] $$$$ \left[ \begin{matrix} 0 & -1 & -3 & -3 \\ -1 & -3 & -3 & -1 \\ -3 & -3 & -1 & 0 \\ -3 & -1 & 0 & 0 \end{matrix} \right] $$

(e) What feature in the image raster does this mask in part(d) detect?

Answer: Edges with large gradient changes in the vertical direction.

  1. You are asked to design and implement a secure entry devise to the School of Computer Science, based on facial detection and recognition of each student and staff. The system must be able to identify each individual member and allow appropriate access through secure doors. You should describe the technique that you would apply together with the problem you believe you would encounter in such a system so that you can:

(i) Gather the required information for processing.

(ii) Identify each individual member. You need to outline the details of your chosen method that will allow make this possible.

(iii). Determine and minimise the drawbacks of the suggested technique.

  1. Feature detection as used for objection recognition and tracking needs to be robust to different types of invariance.

(i) Explain how the following types of invariance can be achieved (state all applicable algorithms): i. Illumination

Answer:

  • The easy way (normalized)
  • Difference based metrics (sift)

ii. Scale

Answer:

  • Pyramids
  • Scale Space (DOG method)

iii. Rotation

Answer:

  • Rotate all features to go the same way in a determined manner
  • Take histogram of Gradient directions
  • Rotate to most dominant (maybe second if its good enough)

(ii) Corner detection is frequently used in motion detection and image registration. i. Explain why a corner is a better feature as compared to edges.

ii. Explain and outline how Moravec operator can be used for corner detection.

(iii) Motion Correspondenve can be used to match features in one image with those in another, to estimate motion. State and explain the THREE principles of motion correspondence. Specifically state how an algorithm is designed to ensure features adhere to these principles.

Answer:

  1. Discreteness: a measure of the distinctiveness of individual points.
  2. Similarity: a measure of how closely two points resemble one another.
  3. Consistency: a measure of how well a match conforms with nearby matches.

7. Lab 1

7.1. Default setting

Default functions

read_image.m

function image = read_image(base_dir, file)

% read_image(base_dir, file) - reads a gif file from the directory base_dir
%                        base_dir should be a string, and if it is the
%                        current directory it should be ''
% It also converts the image from unit8 to double

image = imread([base_dir, file]);

image = double(image);

show_image.m

function show_image(img)
% SHOW_IMAGE Image

figure % creates a new Figure window
colormap(gray);
imagesc(img);

Default parameters

sobel detectors

$$ Sobel_X = \left[ \begin{matrix} -1&0&1 \\ -2&0&2 \\ -1&0&1 \end{matrix} \right] $$ $$ Sobel_Y = \left[ \begin{matrix} 1&2&1 \\ 0&0&0 \\ -1&-2&-1 \end{matrix} \right] $$

Roberts detectors

$$ Roberts_A = \left[ \begin{matrix} 0&1 \\ -1&0 \end{matrix} \right] $$ $$ Roberts_B = \left[ \begin{matrix} -1&0 \\ 0&1 \end{matrix} \right] $$

7.2. Task 1: Create several different thresholds

Question:

What do you notice regarding the effect of changing the threshold?

Source Code

load sobel.mat

m = magnitude(sobelX,sobelY);

show_image(m);
title("m")

for i =20:10:40
show_image(m>i);
title("thresholds=" + i)
end


function m = magnitude(Gx,Gy)

shakey = read_image('','shakey.150.gif');

shakey_X = conv2(shakey,Gx,'valid');
shakey_Y = conv2(shakey,Gy,'valid');

m = (shakey_X.^2 + shakey_Y.^2).^(0.5);

end

Output

m_sobel
thresholds20_sobel
thresholds30_sobel
thresholds40_sobel

7.3. Task 2: Load up the Roberts operator

Question:

What do you notice regarding the difference between Sobel and Roberts

Source Code

load roberts.mat

m = magnitude(robertsA,robertsB);

show_image(m);
title("m")

for i =20:10:40
show_image(m>i);
title("thresholds=" + i)
end


function m = magnitude(Gx,Gy)

shakey = read_image('','shakey.150.gif');

shakey_X = conv2(shakey,Gx,'valid');
shakey_Y = conv2(shakey,Gy,'valid');

m = (shakey_X.^2 + shakey_Y.^2).^(0.5);

end

Output

m_roberts
thresholds20_roberts
thresholds30_roberts
thresholds40_roberts

7.4. Task 3: Replace the gradients with the absolute value of |Gx|+|Gy|

Question:

What do you notice regarding the difference between magnitude and absolute when calculating the edge?

Source Code

  • Note: At here, I use roberts detectors.
load roberts.mat

m = magnitude(robertsA,robertsB);

show_image(m);
title("m")

for i =20:10:40
show_image(m>i);
title("thresholds=" + i)
end


function m = magnitude(Gx,Gy)

shakey = read_image('','shakey.150.gif');

shakey_X = conv2(shakey,Gx,'valid');
shakey_Y = conv2(shakey,Gy,'valid');

m = abs(shakey_X) + abs(shakey_Y);

end

Output

m_roberts_absolute
thresholds20_roberts_absolute
thresholds30_roberts_absolute
thresholds40_roberts_absolute

8. Lab 2

8.1. Default setting

Default functions

read_image.m

function image = read_image(base_dir, file)

% read_image(base_dir, file) - reads a gif file from the directory base_dir
%                        base_dir should be a string, and if it is the
%                        current directory it should be ''

image = imread([base_dir, file]);

image = double(image);

show_image.m

function show_image(img)
% SHOW_IMAGE Image

colormap(gray);
imagesc(img);

N.m

function p = N(m,s,x)

% N(m,s,x) - gives the Probability Density Function (pdf) of the Normal 
% Distribution with mean m, Standard Deviation (sd) s, for the value x.
% example: p = N(0,0.1,[-3:1:3]);
% Calculates pdf with mean = 0, sd = 0.1, for a vector of 7 elements long

p = 1/(s * sqrt(2* pi)) * exp(-(x-m).^2/(2*s^2));

Default parameters

sobel detectors

$$ Sobel_X = \left[ \begin{matrix} -1&0&1 \\ -2&0&2 \\ -1&0&1 \end{matrix} \right] $$ $$ Sobel_Y = \left[ \begin{matrix} 1&2&1 \\ 0&0&0 \\ -1&-2&-1 \end{matrix} \right] $$

8.2. Task 1

Question:

Can you describe the effect in comparison with applying the edge filter to the image directly?

Source Code

% load data
shakey = read_image('','shakey.150.gif');
load filters.mat


% The edge image without Gaussian noise filter, thresholds = 30
shakeyM = magnitude(shakey,sobelX,sobelY);
figure();
show_image(abs(shakeyM)>30);
title("Original edge image")

% The edge image with 3 by 3 Gaussian noise filter, thresholds = 30
shakeyGaussian3 = conv2(shakey,gaussian_filter_3x3,'valid');
shakeyM3 = magnitude(shakeyGaussian3,sobelX,sobelY);
figure();
show_image(abs(shakeyM3)>30);
title("gaussian filter 3x3")

% The edge image with 5 by 5 Gaussian noise filter, thresholds = 30
shakeyGaussian5 = conv2(shakey,gaussian_filter_5x5,'valid');
shakeyM5 = magnitude(shakeyGaussian5,sobelX,sobelY);
figure();
show_image(abs(shakeyM5)>30);
title("gaussian filter 5x5")


% Calculating edge
function m = magnitude(img,Gx,Gy)

shakey_X = conv2(img,Gx,'valid');
shakey_Y = conv2(img,Gy,'valid');

m = abs(shakey_X) + abs(shakey_Y);
end

Output

Gaussian0by0
Gaussian3by3
Gaussian5by5

Where I found that noise level dropped.

8.3. Task 2

Question:

What is the effect of increasing the size of the Gaussian Filter (3x3 versus 5x5 for example)? What is the effect of changing the standard deviation s? Why do you see what you see?

Source Code

% load data
shakey = read_image('','shakey.150.gif');
load filters.mat

% The edge image without Gaussian noise filter, thresholds = 30
shakeyM = magnitude(shakey,sobelX,sobelY);
figure();
show_image(abs(shakeyM)>30);
title("Original edge image")

% Creat gaussian filter
m = 0; % mean
s = 5; % standard deviation
ValueCoordinates = -3 : 1.5 : 3;
GaussianCreated = N(m,s,ValueCoordinates);
GaussianFilter5 = GaussianCreated'*GaussianCreated;

ValueCoordinates = -6 : 1.5 : 6;
GaussianCreated = N(m,s,ValueCoordinates);
GaussianFilter9 = GaussianCreated'*GaussianCreated;

% The edge image with 5 by 5 Gaussian noise filter, thresholds = 30
shakeyGaussian5 = conv2(shakey,GaussianFilter5,'valid');
shakeyM5 = magnitude(shakeyGaussian5,sobelX,sobelY);

% The edge image with 9 by 9 Gaussian noise filter, thresholds = 30
shakeyGaussian9 = conv2(shakey,GaussianFilter9,'valid');
shakeyM9 = magnitude(shakeyGaussian9,sobelX,sobelY);

figure();
show_image(abs(shakeyM5)>30);
title("gaussian filter 5x5, edge")
figure();
show_image(shakeyGaussian5);
title("gaussian filter 5x5, standard deviation is " + s)

figure();
show_image(abs(shakeyM9)>30);
title("gaussian filter 9x9, edge")
figure();
show_image(shakeyGaussian9);
title("gaussian filter 9x9, standard deviation is " + s)



%% Calculating edge
function m = magnitude(img,Gx,Gy)

shakey_X = conv2(img,Gx,'valid');
shakey_Y = conv2(img,Gy,'valid');

m = abs(shakey_X) + abs(shakey_Y);

end

Output

Gaussian0by0
Gaussian5by5_S1
Gaussian5by5_S3
Gaussian5by5_S5

Where I found that with the increasing of standard deviation, the images get blurrier and the gradient between pixels decreases.

Gaussian5by5_S5_original
Gaussian9by9_S5_original
Gaussian5by5_S5
Gaussian9by9_S5
Gaussian5by5_S3
Gaussian9by9_S3

According to the above pictures, I found the size of the Gaussian Filter should be changed with the change of standard deviation. The big value of standard deviation corresponds to big size of the Gaussian Filter.

8.4. Task 3

Question:

I mentioned the Laplacian of the Gaussian in the lecture. How could you combine the idea of the Laplacian operator with the idea of Gaussian smoothing? Try out your ideas.

Source Code

% load data
shakey = read_image('','shakey.150.gif');

% creat filter
size = 9;
valueRange = -4:1:4;
s = 1.4; % standard deviation

LoG_Filter = zeros(size);

% calculate the value of LoG filter
for i = 1:size
    x = valueRange(i);
    LoG_Filter(i,:) = LoG(s,x,valueRange);
end

% apply LoG filter
shakeyLoG = conv2(shakey,LoG_Filter,'valid');

figure();
show_image(shakeyLoG);
title("shakeyLoG")

%% LoGfunction

function VLoG = LoG(s,x,y)

VLoG = -1/(s^4*pi).*(1 - (x.^2+y.^2)./(s^2*2)) .* exp(-(x.^2+y.^2)./(s^2*2));

end

Output

Original
LoG

Source Code

load data
shakey = read_image('','shakey.150.gif');
load filters.mat

% Creat gaussian filter
m = 0; % mean
s = 1; % standard deviation
ValueCoordinates = -3 : 1 : 3;
GaussianCreated = N(m,s,ValueCoordinates);
GaussianFilter7 = GaussianCreated'*GaussianCreated;

% The edge image with 7 by 7 Gaussian noise filter
shakeyGaussian7 = conv2(shakey,GaussianFilter7,'valid');

% use Sobel filter determine the edge
shakeyGaussian7Sobel = magnitude(shakeyGaussian7,sobelX,sobelY);

% use Laplacian filter determine the edge
shakeyGaussian7Laplacian = edge(shakeyGaussian7,'zerocross');

figure();
show_image(abs(shakeyGaussian7Sobel)>30);
title("Gaussian7Sobel, standard deviation is 1, thresholds = 30")

figure();
show_image(shakeyGaussian7Laplacian);
title("Gaussian7Laplacian, standard deviation is 1")



%% Calculating edge
function m = magnitude(img,Gx,Gy)

shakey_X = conv2(img,Gx,'valid');
shakey_Y = conv2(img,Gy,'valid');

m = abs(shakey_X) + abs(shakey_Y);

end

Output

Gaussian7Sobel
Gaussian7Laplacian

We can find that the outputs are totally different between LoG and the way that using Gaussian and Laplacianseparately.