With the advancement of technology, object detection plays an important role in the field of computer vision, which is a branch of machine learning focused on processing visual data.
YOLO is a real-time object detection algorithm that has become widely used in various applications and is known for its speed and accuracy.
The algorithm simplifies detection by creating a single neural network that can process an entire image in one pass, predicting object locations and classes simultaneously.
By dividing the image into a grid, YOLO can localize objects efficiently, trading some accuracy for significant speed improvements.
Non-Maximum Suppression (NMS) is applied to remove redundant or overlapping bounding boxes, keeping only the most confident detections for each object.
YOLOv1 divided an input image into a 7x7 grid, whereas YOLOv2 or YOLO9000 introduced anchor boxes and hierarchical classification and localization.
YOLOv4 introduced numerous state-of-the-art techniques, including CSPDarknet53 as the backbone, Spatial Pyramid Pooling (SPP) for improved feature extraction, and Path Aggregation Network (PANet) for better feature fusion.
Developed by Ultralytics, YOLOv5 prioritized usability, offering a streamlined training pipeline and integration with modern frameworks.
YOLOv8 introduces an anchor-free design, simplifying training and enhancing detection for varying object sizes. The architecture incorporates the C2f module, an evolution of CSPNet.
The YOLOv9 introduces two key innovations to address information loss in deep learning: Programmable Gradient Information (PGI) and a novel architecture called Generalized Efficient Layer Aggregation Network (GELAN).