SSD Object Detection in Real Time (Deep Learning and Caffe)

4 min readNov 4, 2020

In this article, we will be talking about SSD Object Detection- features, advantages, drawbacks, and implement MobileNet SSD model with Caffe — using OpenCV in Python.

Real Time Object Detection

What is Object Detection?

Object Detection in Computer Vision is as simple as it sounds- detecting and predicting objects and localizing their area. Object Detection is based on image classification. Irrespective of the latter being performed using neural networks or primitive classifiers, image classification is always the first step. Building further on this, we can perform detection which localizes all possible objects in a given frame.

Single Shot MultiBox Detector (SSD)

SSD Object Detection extracts feature map using a base deep learning network, which are CNN based classifiers, and applies convolution filters to finally detect objects. Our implementation uses MobileNet as the base network (others might include- VGGNet, ResNet, DenseNet).

SSD with VGG16 Net as Base Network

For further in-depth and an elaborate detail of how SSD Object Detection works refer to this Medium article by Jonathan Hui.

What is Caffe?

Caffe is a deep learning framework developed by Berkeley AI Research and community contributors. Caffe was developed as a faster and far more efficient alternative to other frameworks to perform object detection. Caffe can process 60 million images per day with a single NVIDIA K-40 GPU. That is 1 ms/image for inference and 4 ms/image for learning.

Do check out the Caffe GitHub and Caffe Website.

Code Implementation

Requirements

Python (ver 3.6) and OpenCV (ver 4.2)
Caffe MobileNet SSD model weights and prototxt definition here.

Directory Tree

Create a folder named Caffe and save model weights and prototxt file
Create a python script file detectDNN.py

Importing libraries (Lines 1–8)

Constructing argument parsing (Lines 11–16)

Initializing labels and colors with object names and assigning random color to each label (Lines 19–23)

Loading the MobileNet SSD model and prototxt definition to deploy the weights and initializing video stream (Lines 25–33)

Reading input frames, resizing and extracting dimensions of frame (Lines 39–41)

Converting frame to blob and passing through Caffe model. detections = nn.forward() stores output layer of neural network (Lines 43–49)

Looping over each detection. Storing confidence- prediction percentage of each object corresponding to each label. Filtering out weak detections and storing index ID of each object (Line 51–59)

The next lines of code extract the localized coordinates of each object. Drawing bounding box over detected object along with label and confidence percentage (Line 61–74)

Displaying live streaming with detections and bounding boxes and an escape command. Finally, receiving FPS information and cleaning up (Lines 76–90)

To execute code, run the following command in your project directory on the terminal

Ref : https://github.com/amolikvivian/Caffe-SSD-Object-Detection

What are the drawbacks of Single Shot MultiBox Detector?

SSD Framework though faster than other similar alternatives, finds trouble while detecting smaller objects (still performing better than YOLO).

What alternative object detection frameworks can be used?

Apart from SSD, there are other frameworks which can be implemented in object detection, the more popular ones being YOLO and Fast/Faster-R CNN. The three have their own set of pros and cons, however the SSD method has been found to be the fastest and most efficient among these. To learn more about YOLO and its various versions read here.

SSD Object Detection in Real Time (Deep Learning and Caffe)

Code Implementation

Requirements

Directory Tree

Importing libraries (Lines 1–8)

Constructing argument parsing (Lines 11–16)

Initializing labels and colors with object names and assigning random color to each label (Lines 19–23)

Loading the MobileNet SSD model and prototxt definition to deploy the weights and initializing video stream (Lines 25–33)

Reading input frames, resizing and extracting dimensions of frame (Lines 39–41)

Converting frame to blob and passing through Caffe model. detections = nn.forward() stores output layer of neural network (Lines 43–49)

Looping over each detection. Storing confidence- prediction percentage of each object corresponding to each label. Filtering out weak detections and storing index ID of each object (Line 51–59)

The next lines of code extract the localized coordinates of each object. Drawing bounding box over detected object along with label and confidence percentage (Line 61–74)

Displaying live streaming with detections and bounding boxes and an escape command. Finally, receiving FPS information and cleaning up (Lines 76–90)

To execute code, run the following command in your project directory on the terminal

Ref : https://github.com/amolikvivian/Caffe-SSD-Object-Detection

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Avik Das

No responses yet

More from Avik Das

Outlier De

Outliers are extreme values that deviate from other observations on data , they may indicate a variability in a measurement, experimental…

Querying Microsoft Graph API with Python

Introduction

Recommendation System using Word2vec

— -Curious how NLP and recommendation engines combine? We will use Word2vec, an NLP concept, to recommend products to users.

Sklearn Metrics

The sklearn.metrics module implements several loss, score, and utility functions to measure classification performance. Some metrics might…

Recommended from Medium

Active Learning for Data Labeling

Problem Overview

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Tools I use that are cheaper than Netflix

Lists

Staff picks

Stories to Help You Level-Up at Work

Self-Improvement 101

Productivity 101

How I Am Using a Lifetime 100% Free Server

Get a server with 24 GB RAM + 4 CPU + 200 GB Storage + Always Free

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

Data Science All Algorithm Cheatsheet 2025

Stories, strategies, and secrets to choosing the perfect algorithm.

Just Stop Writing Python Functions Like This!!!

I just reviewed someone else’s code and I was just shocked.