Understanding YOLO-NAS: The Latest Evolution in Real-Time Object Detection

Object detection technology has significantly evolved over the years, and one of the leading architectures driving this innovation is YOLO (You Only Look Once).

YOLO has been instrumental in making object detection both fast and accurate, and its various iterations (YOLOv5, YOLOv7, YOLOv8, etc.) have continuously pushed the limits of what’s possible.

However, the team at Deci has now introduced YOLO-NAS, the latest advancement in the YOLO series, and it outperforms previous YOLO versions by a considerable margin.

In this blog, we’ll explore how YOLO-NAS performs against other YOLO models, its architecture, training methods, and the steps to run inference using YOLO-NAS models.

How Does YOLO-NAS Perform?

YOLO-NAS brings impressive improvements over its predecessors:

20.5% better performance compared to YOLOv7.
11% improvement over YOLOv5.
1.75% enhancement over YOLOv8.

While these numbers may seem small compared to older models, it’s important to understand that YOLO-NAS delivers a perfect balance between speed and accuracy, making it a formidable choice for real-time applications where latency and precision are critical.

What Makes YOLO-NAS Special?

YOLO-NAS is designed to overcome some of the key limitations in the existing YOLO models, such as inadequate quantization support and suboptimal accuracy-latency trade-offs.

Deci has introduced innovative approaches that allow YOLO-NAS to excel in these areas, pushing the boundaries of real-time object detection.

Key highlights include:

Faster inference times with a latency as low as 2.36 milliseconds for the fastest model.
An mAP (mean Average Precision) of 47, which makes it extremely reliable for detection tasks.
Quantization support: All YOLO-NAS models are quantized into INT-8, which minimizes the model’s size and computation requirements while only slightly reducing accuracy.

Architecture and Training

YOLO-NAS models are built using Neural Architecture Search (NAS), an algorithmic approach that automates the design process of neural networks. The YOLO-NAS architecture is optimized for speed and accuracy, focusing on improving quantization, efficiency, and accuracy-latency trade-offs.

The models are pre-trained on the Object365 dataset, which consists of:

365 object categories.
2 million images.
30 million bounding boxes.

This massive dataset provides the models with rich contextual understanding, further augmented by knowledge distillation techniques and DFL (Distribution Focal Loss), which enhance the overall training process.

Additionally, YOLO-NAS incorporates training on pseudo-labeled images from the COCO dataset, specifically using 118,000 unlabeled images. This approach enriches the model’s detection capabilities, making it robust in various real-world scenarios.

The Three YOLO-NAS Models

YOLO-NAS comes in three versions: small, medium, and large. Each of these models caters to different use cases depending on the computational resources available and the level of accuracy required.

Small model: The most lightweight, with 19 million parameters, making it ideal for real-time applications on resource-constrained devices.
Medium model: Balances accuracy and speed.
Large model: The most powerful version, ideal for tasks that demand high accuracy and where computational resources are not a constraint.

These models have been quantized into INT-8 for faster inference on low-resource devices, and they experience minimal accuracy loss even after quantization.

Running YOLO-NAS Inference

Running inference using YOLO-NAS is a straightforward process, especially if you’re familiar with tools like PyTorch. In this section, we’ll guide you through the steps to set up YOLO-NAS, download the models, and run inference on images and videos.

Step 1: Installing Dependencies

YOLO-NAS is available through SuperGradients, a popular open-source library. Here’s how to install the necessary libraries:

!pip install super-gradients
!pip install torch-summary

You might also need to restart your runtime if you’re using Google Colab to avoid dependency issues.

Step 2: Downloading the YOLO-NAS Models

Once the environment is set up, the next step is downloading the YOLO-NAS models. We use the get function from the SuperGradients library to download the pre-trained models.

from super_gradients.training import models
model = models.get("yolo_nas_s", pretrained_weights="coco")

In this example, we are downloading the small YOLO-NAS model pre-trained on the COCO dataset. The model is loaded onto a CUDA device to ensure that inference is performed using the GPU.

Step 3: Running Inference on Images

YOLO-NAS makes it easy to run inference on images using a predict function. Here’s how it works:

prediction = model.predict(image_path, conf_threshold=0.25)
prediction.show()

You can adjust the confidence threshold to control how confident the model should be before labeling an object. The output will display the inference results, showing detected objects and their bounding boxes.

Step 4: Running Inference on Videos

Running inference on videos follows a similar process, but with a loop that iterates over video frames:

for video_file in video_files:
    prediction = model.predict(video_file)
    prediction.save(output_path)

This will process the entire video and save the output with detected objects.

The Power of Neural Architecture Search (NAS)

The NAS in YOLO-NAS stands for Neural Architecture Search, a cutting-edge approach that automates the architecture design process. NAS uses optimization algorithms like AutoNAC to find the most efficient architecture for a given task.

What makes YOLO-NAS even more powerful is its ability to incorporate hardware and data-aware decisions, ensuring that the architecture is optimized for real-world constraints such as quantization and compilers.

Quantization and Efficiency

One of the major highlights of YOLO-NAS is its quantization-aware design. By incorporating QA-RepVGG blocks, YOLO-NAS models are optimized for 8-bit quantization, which enables significant speedups during inference, especially on hardware that supports low-bit operations.

These blocks also help YOLO-NAS retain high accuracy, even after quantization, ensuring minimal loss during post-training optimization.

YOLO-NAS operates at the efficiency frontier, meaning it delivers optimal performance in terms of both accuracy and latency.

Conclusion

YOLO-NAS represents a major leap forward in real-time object detection. With its combination of Neural Architecture Search, quantization-aware design, and pre-training on massive datasets, YOLO-NAS provides significant improvements over previous YOLO models.

Whether you’re working on a resource-constrained device or require high accuracy, YOLO-NAS offers a solution tailored to your needs.

If you’re interested in trying it out, YOLO-NAS is highly accessible via SuperGradients, and you can easily run inference on images and videos.

For those wanting to dive deeper into object detection models, check out our YOLO Master Class playlist, where we cover YOLOv5, YOLOv8, and other cutting-edge models in detail.

Let us know what you’d like to learn next in the comments below! Until next time, happy coding!