Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Object detection technology has significantly evolved over the years, and one of the leading architectures driving this innovation is YOLO (You Only Look Once).
YOLO has been instrumental in making object detection both fast and accurate, and its various iterations (YOLOv5, YOLOv7, YOLOv8, etc.) have continuously pushed the limits of what’s possible.
However, the team at Deci has now introduced YOLO-NAS, the latest advancement in the YOLO series, and it outperforms previous YOLO versions by a considerable margin.
In this blog, we’ll explore how YOLO-NAS performs against other YOLO models, its architecture, training methods, and the steps to run inference using YOLO-NAS models.
YOLO-NAS brings impressive improvements over its predecessors:
While these numbers may seem small compared to older models, it’s important to understand that YOLO-NAS delivers a perfect balance between speed and accuracy, making it a formidable choice for real-time applications where latency and precision are critical.
YOLO-NAS is designed to overcome some of the key limitations in the existing YOLO models, such as inadequate quantization support and suboptimal accuracy-latency trade-offs.
Deci has introduced innovative approaches that allow YOLO-NAS to excel in these areas, pushing the boundaries of real-time object detection.
Key highlights include:
YOLO-NAS models are built using Neural Architecture Search (NAS), an algorithmic approach that automates the design process of neural networks. The YOLO-NAS architecture is optimized for speed and accuracy, focusing on improving quantization, efficiency, and accuracy-latency trade-offs.
The models are pre-trained on the Object365 dataset, which consists of:
This massive dataset provides the models with rich contextual understanding, further augmented by knowledge distillation techniques and DFL (Distribution Focal Loss), which enhance the overall training process.
Additionally, YOLO-NAS incorporates training on pseudo-labeled images from the COCO dataset, specifically using 118,000 unlabeled images. This approach enriches the model’s detection capabilities, making it robust in various real-world scenarios.
YOLO-NAS comes in three versions: small, medium, and large. Each of these models caters to different use cases depending on the computational resources available and the level of accuracy required.
These models have been quantized into INT-8 for faster inference on low-resource devices, and they experience minimal accuracy loss even after quantization.
Running inference using YOLO-NAS is a straightforward process, especially if you’re familiar with tools like PyTorch. In this section, we’ll guide you through the steps to set up YOLO-NAS, download the models, and run inference on images and videos.
YOLO-NAS is available through SuperGradients, a popular open-source library. Here’s how to install the necessary libraries:
!pip install super-gradients
!pip install torch-summary
You might also need to restart your runtime if you’re using Google Colab to avoid dependency issues.
Once the environment is set up, the next step is downloading the YOLO-NAS models. We use the get
function from the SuperGradients library to download the pre-trained models.
from super_gradients.training import models
model = models.get("yolo_nas_s", pretrained_weights="coco")
In this example, we are downloading the small YOLO-NAS model pre-trained on the COCO dataset. The model is loaded onto a CUDA device to ensure that inference is performed using the GPU.
YOLO-NAS makes it easy to run inference on images using a predict
function. Here’s how it works:
prediction = model.predict(image_path, conf_threshold=0.25)
prediction.show()
You can adjust the confidence threshold to control how confident the model should be before labeling an object. The output will display the inference results, showing detected objects and their bounding boxes.
Running inference on videos follows a similar process, but with a loop that iterates over video frames:
for video_file in video_files:
prediction = model.predict(video_file)
prediction.save(output_path)
This will process the entire video and save the output with detected objects.
The NAS in YOLO-NAS stands for Neural Architecture Search, a cutting-edge approach that automates the architecture design process. NAS uses optimization algorithms like AutoNAC to find the most efficient architecture for a given task.
What makes YOLO-NAS even more powerful is its ability to incorporate hardware and data-aware decisions, ensuring that the architecture is optimized for real-world constraints such as quantization and compilers.
One of the major highlights of YOLO-NAS is its quantization-aware design. By incorporating QA-RepVGG blocks, YOLO-NAS models are optimized for 8-bit quantization, which enables significant speedups during inference, especially on hardware that supports low-bit operations.
These blocks also help YOLO-NAS retain high accuracy, even after quantization, ensuring minimal loss during post-training optimization.
YOLO-NAS operates at the efficiency frontier, meaning it delivers optimal performance in terms of both accuracy and latency.
YOLO-NAS represents a major leap forward in real-time object detection. With its combination of Neural Architecture Search, quantization-aware design, and pre-training on massive datasets, YOLO-NAS provides significant improvements over previous YOLO models.
Whether you’re working on a resource-constrained device or require high accuracy, YOLO-NAS offers a solution tailored to your needs.
If you’re interested in trying it out, YOLO-NAS is highly accessible via SuperGradients, and you can easily run inference on images and videos.
For those wanting to dive deeper into object detection models, check out our YOLO Master Class playlist, where we cover YOLOv5, YOLOv8, and other cutting-edge models in detail.
Let us know what you’d like to learn next in the comments below! Until next time, happy coding!