20 Days of openCV - Day 1
This blog is based on Ardrian R. Face detection with OpenCV and deep learning - PyImageSearch post
Introduction
We already know that openCV ships out-of-the-box with pretrained haar cascades which are used for face detection. But, only few known that it also has hidden dnn module which uses deep learning models for object detection
In this post we will use openCV’s dnn module for face detection.
Object Detection
dnn module was introduced in openCV 3.3. This module supports number of Deep learning frameworks like tensorflow, caffe and torch/pytorch
Usally there are 3 main models used for object detection task.
- RCNN - Regions with CNN (used in traditional Models)
- YOLO - DNN based hence more accurate, but not very fast
- SDD - Mix between high accuracy and fast performance.
Code
In this blog we will use SSD with mobile-net. Now lets get coding.
Load the model
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# read the model arch and load weights
net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])
# load the input image and construct an input blob for the image
# by resizing to a fixed 300x300 pixels and then normalizing it
image = cv2.imread(args["image"])
(h, w) = image.shape[:2]
blob = cv2.dnn.blobFromImage(cv2.resize(image, resize=(300, 300)), scaled=1.0, normalize=(300, 300), (104.0, 177.0, 123.0))
# set input
net.setInput(blob)
# get predictions
detections = net.forward()
Now we need to read the predictions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# iterate through all predictions
for i in range(0, detections.shape[2]):
# get confidence for that detection
confidence = detections[0, 0, i, 2]
if confidence > args['confidence']:
# descale the cordinated
box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
# destructure the points from array
(startX, startY, endX, endY) = box.astype("int")
# Add text and rectangle
text = "{:.2f}%".format(confidence * 100)
y = startY - 10 if startY - 10 > 10 else startY + 10
cv2.rectangle(image, (startX, startY), (endX, endY), (0, 0, 255), 2)
cv2.putText(image, text, (startX, y),
cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2)
Now show the final image
1
2
3
cv2.imshow("Output", image)
cv2.waitKey(0)
Now we go one step further we will feed the input from our webcam to get real time predictions.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cap = cv2.VideoCapture(0)
while True:
# Capture frame-by-frame
ret, frame = cap.read()
# resize frame for prediction
image = cv2.resize(frame,(300,300))
# copy the above script here ......
cv2.namedWindow("frame", cv2.WINDOW_NORMAL)
cv2.imshow("frame", image)
if cv2.waitKey(1) >= 0: # Break with ESC
break