Saturday, November 2, 2019

Real-Time Object Tracking with YOLOV3 and Deep Sort

YOLO(You only look Once) version3 is a model for Object Detection. Now, What is Object Detection? Object detection is a technique to identify the location of objects in an image. If we have a single object in an image and we want to detect it, this is known as image localization. If there are multiple different objects in an image, then there we need to determine the location on the image where certain objects are present, as well as classifying those objects.

Previously, there are methods like R-CNN, SSD, Faster RCNN, Mask RCNN, and their different variations, they are used to perform this task in multiple steps. They are really hard to optimize and slow to run because each individual component must be trained separately. YOLOv3 is capable to does it all with a single neural network.

YOLO v3 gives prediction at three scales, which are specifically given by down-sampling the dimension of the input image by 32, 16 and 8 respectively.

Realtime object tracking

How can we use this for Object tracking?

Counting the number of objects(person, cars, etc) manually is too tough and there are high chances of mistakes. It's often impractical for massive datasets of surveillance videos to analyze manually. Automate tracking of objects is one of the primary ability for computerized analysis of such videos. Object tracking and video analysis play a crucial role in many applications including traffic safety and intelligent monitoring. Real-world is so weird where we found some challenges like posture, partial occlusions, background cluster, and illumination change complicate the issue.

For the purpose of object tracking, we use a Deep Sort Algorithm where we start with all possible detections in a frame and give them an ID. In the following frame, we try to carry forward an object ID. If the object is moved away from the frame then that ID is dropped. if the new object comes then they start off with a fresh ID.


Infrastructure Planning - Government, industry, and business use Object counting and tracking to learn various things like how crowded are public places at a given time with peoples and vehicles. With the analysis of data, they can reconstruct the roads and industry can change their infrastructure.

Safety - If you are searching for someone who is lost in a natural disaster or any crowded area, stuck in some remote location. Computer vision is really very helpful in such type of cases.

Retail - Inventory management, optimizing store layout, understanding peak times and potentially even protect against theft in the retail stores.

Security - People monitoring in crowded places like Shopping malls, airports, railway stations, tourist sites, etc using CCTV Cameras which can prevent criminal, activities on roads.

What is a Deep SORT(Simple Online Realtime Tracking) Algorithm?

In Deep SORT Algorithm, tracking is not just based on distance and velocity but also what that person looks like. Deep sort allows us to add this feature by computing deep features for every bounding box and using the similarity between deep features to also factor into the tracking logic.
The reason by which it tracks really good is because of the use of a Kalman Filter and The Hungarian Algorithm.

1. A Hungarian algorithm can tell if an object in the current frame is the same as the one in the previous frame. It will be used for association and id attribution.

2. A Kalman Filter is an algorithm that can predict future positions based on the current position. It can also estimate the current position better than what the sensor is telling us. it will be used to have a better association.


Now let's get started with an implementation part

For the implementation of Object tracking with YOLOv3. Feel free to check out my Github Repo here.

Let's start with the set up of the Deep Sort algorithm from the deep sort Github repository. You can check out that repository from here.

steps follow for object tracking

If you compile these commands it will automatically clone the repo and set up in your directory.

Next, you need to download the Yolo weights from my google drive. With this link, you can download the weights and use them locally and put them in the main directory.

Next, you need to set the Yolo weights with the help of a deep_sort package.

steps follow for object tracking

Next, you can take any video from the internet to check the output of the model. For this, I have taken a video from the Active Vision Laboratory of Oxford University, You can find that video from this link. I converted the video to mp4 by using ffmpeg.  If your ffmpeg command is not working then follow this site to install it. The sample code is here:

steps follow for object tracking

With the help of deep_sort, I used four variables to keep track of the four end coordinates of the bounding box. To filter out the correct bounding boxes I used a threshold to filter out the duplicates.

steps follow for object tracking

Now finally, output.avi video file saves in your directory.

real-time object tracking

Problems with Deep Sort

1. If the bounding boxes are too big than too much background is captured in the features reducing the effectiveness of the algorithm.

2. If people are dressed similarly as happens in a sports team that can result in similar features and ID switching.