# Homework 7

In this homework, we will implement a simplified version of object detection process. Note that the tests on the notebook are not comprehensive, autograder will contain more tests.

# Part 1: Hog Representation (10 points)

In this section, we will compute the average hog representation of human faces.

There are 31 aligned face images provided in the \face folder. They are all aligned and have the same size. We will get an average face from these images and compute a hog feature representation for the averaged face.

Use the hog function provided by skimage library, and implement a hog representation of objects.
Implement hog_feature function in detection.py

Hog表征

【特征检测】HOG特征算法

# Part 2: Sliding Window (30 points)

Implement sliding_window function to have windows slide across an image with a specific window size. The window slides through the image and check if an object is detected with a high score at every location. These scores will generate a response map and you will be able to find the location of the window with the highest hog score.

Sliding window successfully found the human face in the above example. However, in the cell below, we are only changing the scale of the image, and you can see that sliding window does not work once the scale of the image is changed.

# Part 3: Image Pyramids (30 points)

In order to make sliding window work for different scales of images, you need to implement image pyramids where you resize the image to different scales and run the sliding window method on each resized image. This way you scale the objects and can detect both small and large objects.

### 3.1 Image Pyramid (10 points)

Implement pyramid function in detection.py, this will create pyramid of images at different scales. Run the following code, and you will see the shape of the original image gets smaller until it reaches a minimum size.

### 3.2 Pyramid Score (20 points)

After getting the image pyramid, we will run sliding window on all the images to find a place that gets the highest score. Implement pyramid_score function in detection.py. It will return the highest score and its related information in the image pyramids.

From the above example, we can see that image pyramid has fixed the problem of scaling. Then in the example below, we will try another image and implement deformable part model.

# Part 4: Deformable Parts Detection

In order to solve the problem above, you will implement deformable parts model in this section, and apply it on human faces.

The first step is to get a detector for each part of the face, including left eye, right eye, nose and mouth.

For example for the left eye, we have provided the groundtruth location of left eyes for each image in the \face directory. This is stored in the lefteyes array with shape (n,2), each row is the (r,c) location of the center of left eye. You will then find the average hog representation of the left eyes in the images.

DPM（Deformable Part Model）原理详解

Run through the following code to get a detector for left eyes.

Run through the following code to get a detector for right eye.

Run through the following code to get a detector for nose.

Run through the following code to get a detector for mouth

# Part 5: Human Parts Location (10 points)

Implement compute_displacement to get an average shift vector mu and standard deviation sigma for each part of the face. The vector mu is the distance from the main center, i.e the center of the face, to the center of the part.

Your implementation is correct!


After getting the shift vectors, we can run our detector on a test image. We will first run the following code to detect each part of left eye, right eye, nose and mouth in the image. You will see a response map for each of them.

After getting the response maps for each part of the face, we will shift these maps so that they all have the same center as the face. We have calculated the shift vector mu in compute_displacement, so we are shifting based on vector mu. Implement shift_heatmap function in detection.py.

# Part 6: Gaussian Filter (20 points)

## Part 6.1 Gaussian Filter

In this part, apply gaussian filter convolution to each heatmap. Blur by kernel of standard deviation sigma, and then add the heatmaps of the parts with the heatmap of the face. On the combined heatmap, find the maximum value and its location. You can use function provided by skimage to implement gaussian_heatmap.

## 6.2 Result Analysis (10 points)

Does your DPM work on detecting human faces? Can you think of a case where DPM may work better than the detector we had in part 3 (sliding window + image pyramid)? You can also have examples that are not faces.

DPM的原理是很能理解的，就是将各个组件的响应值累加起来，判断哪个区域拥有最高的响应值。

## Extra Credit (1 point)

You have tried detecting one face from the image, and the next step is to extend it to detecting multiple occurences of the object. For example in the following image, how do you detect more than one face from your response map? Implement the function detect_multiple, and write code to visualize your detected faces in the cell below.