Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/doc/assets/yolo26_out.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
374 changes: 305 additions & 69 deletions docs/doc/en/vision/customize_model_yolov8.md

Large diffs are not rendered by default.

83 changes: 36 additions & 47 deletions docs/doc/en/vision/yolov5.md
Original file line number Diff line number Diff line change
@@ -1,93 +1,83 @@
---
title: MaixPy MaixCAM Using YOLOv5 / YOLOv8 / YOLO11 for Object Detection
---
# MaixPy: Object Detection with YOLOv5 / YOLOv8 / YOLO11 / YOLO26 Models
## Concept of Object Detection
Object detection refers to identifying the positions and categories of targets in images or videos—for example, detecting objects like apples and airplanes in an image and marking their locations.

## Object Detection Concept
Unlike image classification, it includes positional information, so the result of object detection is usually a bounding box that outlines the object's position.

Object detection refers to detecting the position and category of objects in images or videos, such as identifying apples or airplanes in a picture and marking their locations.

Unlike classification, object detection includes positional information. Therefore, the result of object detection is generally a rectangular box that marks the location of the object.

## Object Detection in MaixPy

MaixPy provides `YOLOv5`, `YOLOv8`, and `YOLO11` models by default, which can be used directly:
## Using Object Detection in MaixPy
MaixPy natively supports the **YOLOv5**, **YOLOv8**, **YOLO11** and **YOLO26** models, which can be used directly:
> YOLOv8 requires MaixPy >= 4.3.0.
> YOLO11 requires MaixPy >= 4.7.0.

> YOLO26 requires MaixPy >= 4.12.5.
```python
from maix import camera, display, image, nn, app

detector = nn.YOLOv5(model="/root/models/yolov5s.mud", dual_buff=True)
# detector = nn.YOLOv8(model="/root/models/yolov8n.mud", dual_buff=True)
# detector = nn.YOLO11(model="/root/models/yolo11n.mud", dual_buff=True)
# detector = nn.YOLO26(model="/root/models/yolo26n.mud", dual_buff=True)

cam = camera.Camera(detector.input_width(), detector.input_height(), detector.input_format())
disp = display.Display()

while not app.need_exit():
img = cam.read()
objs = detector.detect(img, conf_th=0.5, iou_th=0.45)
objs = detector.detect(img, conf_th = 0.5, iou_th = 0.45)
for obj in objs:
img.draw_rect(obj.x, obj.y, obj.w, obj.h, color=image.COLOR_RED)
img.draw_rect(obj.x, obj.y, obj.w, obj.h, color = image.COLOR_RED)
msg = f'{detector.labels[obj.class_id]}: {obj.score:.2f}'
img.draw_string(obj.x, obj.y, msg, color=image.COLOR_RED)
img.draw_string(obj.x, obj.y, msg, color = image.COLOR_RED)
disp.show(img)
```

Example video:

Demo Video:
<div>
<video playsinline controls autoplay loop muted preload src="/static/video/detector.mp4" type="video/mp4">
</video>
</div>

Here, the camera captures an image, passes it to the `detector` for detection, and then displays the results (classification name and location) on the screen.

You can switch between `YOLO11`, `YOLOv5`, and `YOLOv8` simply by replacing the corresponding line and modifying the model file path.
The code above captures images via the camera, passes them to the `detector` for inference, and then displays the detection results (category names and positions) on the screen after obtaining them.

For the list of 80 objects supported by the model, see the appendix of this document.
You can switch between **YOLO11/v5/v8/26** simply by replacing the corresponding model initialization code—note to modify the model file path as well.

For more API usage, refer to the documentation for the [maix.nn](/api/maix/nn.html) module.
See the appendix of this article for the list of 80 object categories supported by the pre-trained models.

## dual_buff for Double Buffering Acceleration
For more API details, refer to the documentation of the [maix.nn](/api/maix/nn.html) module.

You may notice that the model initialization uses `dual_buff` (default value is `True`). Enabling the `dual_buff` parameter can improve efficiency and increase the frame rate. For more details and usage considerations, see the [dual_buff Introduction](./dual_buff.md).
## Dual Buffer Acceleration (`dual_buff`)
You may notice the `dual_buff` parameter is used during model initialization (it is `True` by default). Enabling this parameter can improve runtime efficiency and frame rate. For the specific principle and usage notes, see [Introduction to dual_buff](./dual_buff.md).

## More Input Resolutions

The default model input resolution is `320x224`, which closely matches the aspect ratio of the default screen. You can also download other model resolutions:
The default model input resolutions are **320x224** for MaixCam and **640x480** for MaixCam2, as these aspect ratios are close to the native screen resolutions of the devices. You can also manually download models with other resolutions for replacement:

YOLOv5: [https://maixhub.com/model/zoo/365](https://maixhub.com/model/zoo/365)
YOLOv8: [https://maixhub.com/model/zoo/400](https://maixhub.com/model/zoo/400)
YOLO11: [https://maixhub.com/model/zoo/453](https://maixhub.com/model/zoo/453)

Higher resolutions provide more accuracy, but take longer to process. Choose the appropriate resolution based on your application.
Higher resolutions yield higher detection accuracy but take longer to run. Choose the appropriate resolution based on your application scenario.

## Which Model to Use: YOLOv5, YOLOv8, or YOLO11?
## Which to Choose: YOLOv5, YOLOv8, YOLO11 or YOLO26?
The pre-provided models include **YOLOv5s**, **YOLOv8n**, **YOLO11n** and **YOLO26n**. The YOLOv5s model has a larger size, while YOLOv8n, YOLO11n and YOLO26n run slightly faster. According to official data, the accuracy ranking is **YOLO26n > YOLO11n > YOLOv8n > YOLOv5s**. You can conduct actual tests and select the model that fits your needs.

We provide three models: `YOLOv5s`, `YOLOv8n`, and `YOLO11n`. The `YOLOv5s` model is larger, while `YOLOv8n` and `YOLO11n` are slightly faster. According to official data, the accuracy is `YOLO11n > YOLOv8n > YOLOv5s`. You can test them to decide which works best for your situation.
You can also try the **YOLOv8s** or **YOLO11s** models—their frame rates will be slightly lower (e.g., yolov8s_320x224 runs 10ms slower than yolov8n_320x224), but their accuracy is higher than the nano versions. These models can be downloaded from the model libraries mentioned above or exported by yourself from the official YOLO repositories.

Additionally, you may try `YOLOv8s` or `YOLO11s`, which will have a lower frame rate (e.g., `yolov8s_320x224` is 10ms slower than `yolov8n_320x224`), but offer higher accuracy. You can download these models from the model library mentioned above or export them yourself from the official `YOLO` repository.
## Is It Allowed to Use Different Resolutions for Camera and Model?
When using the `detector.detect(img)` function for inference, if the resolution of `img` differs from the model's input resolution, the function will automatically call `img.resize` to scale the image to match the model's input resolution. The default resizing method is `image.Fit.FIT_CONTAIN`, which scales the image while maintaining its aspect ratio and fills the surrounding areas with black pixels. The detected bounding box coordinates are also automatically mapped back to the coordinates of the original `img`.

## Different Resolutions for Camera and Model
## Train Custom Object Detection Models Online with MaixHub
If you need to detect specific objects instead of using the pre-trained 80-class model, visit [MaixHub](https://maixhub.com) to learn and train custom object detection models—simply select **Object Detection Model** when creating a project. For details, refer to [MaixHub Online Training Documentation](./maixhub_train.md).

If the resolution of `img` is different from the model's resolution when using the `detector.detect(img)` function, the function will automatically call `img.resize` to adjust the image to the model's input resolution. The default `resize` method is `image.Fit.FIT_CONTAIN`, which scales while maintaining the aspect ratio and fills the surrounding areas with black. The detected coordinates will also be automatically mapped back to the original `img`.
You can also find models shared by the community in the [MaixHub Model Zoo](https://maixhub.com/model/zoo?platform=maixcam).

## Training Your Own Object Detection Model on MaixHub
## Train Custom Object Detection Models Offline
It is highly recommended to start with MaixHub online training—offline training is more complex and not suggested for beginners.

If you need to detect specific objects beyond the 80 categories provided, visit [MaixHub](https://maixhub.com) to learn and train an object detection model. Select "Object Detection Model" when creating a project. Refer to the [MaixHub Online Training Documentation](./maixhub_train.md).
This method assumes you have basic relevant knowledge (which will not be covered in this article). Search online for solutions if you encounter problems.

Alternatively, you can find models shared by community members at the [MaixHub Model Library](https://maixhub.com/model/zoo?platform=maixcam).

## Training Your Own Object Detection Model Offline

We strongly recommend starting with MaixHub for online training, as the offline method is much more difficult and is not suitable for beginners. Some knowledge may not be explicitly covered here, so be prepared to do further research.

Refer to [Training a Custom YOLOv5 Model](./customize_model_yolov5.md) or [Training a Custom YOLOv8/YOLO11 Model Offline](./customize_model_yolov8.md).

## Appendix: 80 Classes

The 80 objects in the COCO dataset are:
See [Offline Training of YOLOv5 Models](./customize_model_yolov5.md) or [Offline Training of YOLOv8/YOLO11/YOLO26 Models](./customize_model_yolov8.md) for details.

## Appendix: 80 Object Categories
The 80 object categories of the COCO dataset are as follows:
```txt
person
bicycle
Expand Down Expand Up @@ -169,5 +159,4 @@ scissors
teddy bear
hair dryer
toothbrush
```

```
Loading