Strided Inference: small object detection in high resolution images
Performing small object detection on large images is a complex and challenging task. With the increasing prevalence of high-resolution images, accurately identifying and locating small objects within them has become crucial in various domains. Small objects often lack visual cues and can be easily overshadowed by the background or larger objects. To address these challenges, I introduce you to Strided Inference, a term coined by me and my colleague back in my BRIDGEi2i days. This approach enhances accuracy and enables the detection of tiny objects that would have otherwise been missed. In this blog, we will explore a bit more about how it works and it’s results.
Inspired by a 2019 paper “The Power of Tiling for Small Object Detection” by Unel, Ozkalaycı, and Cıgla, we developed a python module that utilizes Strided Inferencing for improved small object detection in large-scale images. Typically, when training a Convolutional Neural Network (CNN), the training images are resized to a smaller scale. However, this resizing process often leads to image distortion and loss of valuable information when detecting objects in high-resolution images. Consequently, the network struggles to learn and detect the targeted objects effectively. By employing Strided Inference, our module maximizes the potential of pre-trained CNNs or custom networks, enabling significantly better detection results in large resolution images.
How does it work?
In essence, the Strided Inferencing module operates by leveraging a reliable object detection model that is optimized for images of normal size. For our example use case, we employ the SSD_Mobilenet object detection model from OpenCV’s DNN module. By utilizing one of its pre-trained networks, we can effectively detect various objects within our large-scale image. The main idea is to ensure that the object detection model being used is capable of accurately identifying the desired objects; in our case- People.
Step 1: Creating tiles
The Strided Inference module firstly divides the large image into smaller tiles, as demonstrated in the GIF. Notably, these tiles overlap with each other, ensuring comprehensive coverage. This overlapping nature is crucial as it prevents any objects from being missed or fragmented during the division process. By incorporating overlapping tiles, the module guarantees that each object remains intact within at least one tile, even if it may be partially present in multiple tiles. This approach minimizes the risk of neglecting or misrepresenting objects in the image.
Step 2: Perform object detection on tiles
We now do a forward pass on our Object Detector which is good at finding objects on a normal sized images it was trained on. As you see below, the red bounding box is the object which would might have not been detected if we only had non-overlapping tiles. We see how at least one tile is able to detect the object.
Step 3: Performing Non Maximum Supression to remove duplicates
The above image is also hinting on what our next step is and why is it required. As you see above, using overlapping tiles would lead to duplicates. For example the person in green bounding box or the one in blue. Now of course we don’t want our final output to having duplicate bounding boxes, right? So the module then uses the good old Non Maximum Suppression (NMS). We firstly keep the information of how the images were divided into tiles and use these coordinates to get the bounding box values with respect to the original image. Now what we get is duplicates- more than one detection of same object. Now since the object is as it is in both images, the model gives exact same bounding box in most cases. So using an IOU of 0.9, we remove the duplicates.
Step 4: Final result
What you eventually get is the collection of final detections on the original image. Here is the comparison:
Performing object detection without Strided Inference:
And when using Strided Inference:
This methodology was used in one of our solution and we saw Precision and Recall of some object’s detection increase from 0.2 to 0.9, which is quite significant. And we decided this might be really helpful for others as it has a general use case, so we created a custom module that anyone can just make use of. Here’s the link to the GitHub module. Steps to work with this module and use it for your own images is explained under Trying_strided_inference notebook.
I hope this might come handy for someone else as well. I later parralalized this code to work on multiple images simultaneously but didn’t test it end to end. Future work would be to test this module to see if it works well and another thing to work with will be to do in memory computation of the tiled images and not save them in a temporary directory, hopefully making it more efficient. I didn’t have much free time back in 2021, so maybe someone can fork the repository and raise some pull requests. Will be very much open to see the extensions to this. Thanks for reading!