This repository provides a small desktop workflow for collecting images, annotating defects in YOLO format, launching YOLOv8 training, and running live model inference against a camera feed.
- Live camera preview and capture in
main.py - Mandatory post-capture labeling, null marking, and capture discard behavior in
main.py - Class-name and class-color management through flat files
- Dataset browsing/edit/delete flows in
main.py - Temporary YOLO dataset preparation and training-process launch in
train_model.py - Live model inference, telemetry overlay, and screenshot capture in
run_inference.py
- Headless services or APIs
- Multi-user coordination
- Database-backed storage
- Automated pre-labeling or proposal models
- Production-grade inspection throughput or hardened deployment behavior
The repo is a script-oriented desktop application rather than a packaged Python library. Each top-level script owns a user-facing workflow:
main.pyis the primary integration script and contains the largest amount of behavior: capture, annotation, inspect/edit flows, camera settings, and supporting persistence helpers.train_model.pyis a companion GUI that prepares a disposable.yolo_training_cache/tree and launches the Ultralytics CLI in a subprocess.run_inference.pyis a companion GUI that loads a YOLO model, reads camera frames on a timer, and renders detections plus local telemetry.
The tools are coupled through on-disk conventions instead of imports between scripts. Shared concepts such as classes, colors, capture folders, and splash/icon handling are duplicated in small amounts across scripts rather than centralized into a common package.
main.py: capture window, OpenCV annotation loop, YOLO label read/write helpers, class/color editing, inspect-mode navigation, and camera property dialogtrain_model.py: classes-file reader, GPU detection, training-cache builder, Ultralytics CLI resolution, and training log/progress UIrun_inference.py: Torch preload, delayed OpenCV import, model loading, inference overlay rendering, telemetry collection, and inference screenshot captureclasses.txt: ordered class list used across labeling, training, and inferenceclass_colors.json: optional RGB palette aligned with class indicesconfig.json: preview-timer and annotation-loop timing overridescaptures/: image store for captured images plusnull/andinference/subfolders
Capture and label:
CameraWindowopens a selected camera and shows the live feed.- A capture writes a timestamped image under
captures/. annotate_image()opens an OpenCV labeling window inside the running Qt application.- Saving writes an adjacent YOLO label file, null-marking moves the image to
captures/null/, and cancel removes the captured image so the main capture flow does not intentionally leave new unlabeled files behind.
Training launch:
- The training window accepts a dataset folder plus
classes.txt. prepare_dataset()filters to images that already have adjacent.txtlabels.- The method rebuilds
.yolo_training_cache/, distributes labeled items into train/validation folders with a deterministic shuffle, and writesdata.yaml. - A
QProcessstartsyolo detect train ...and streams merged logs back into the GUI while parsing simplecurrent/totaltokens for progress and ETA.
Live inference:
- The inference window preloads Torch before importing OpenCV to reduce Windows DLL/OpenMP conflicts.
- The user chooses a camera and a
.ptmodel. - A timer reads frames, runs YOLO inference when a model is loaded, draws detections, and overlays telemetry.
- Optional screenshots are written to
captures/inference/.
- YOLO labels are stored next to images using
<image-stem>.txt. - Each valid label line is
class_id x_center y_center width heightwith normalized coordinates. classes.txtis one class per non-empty line.class_colors.jsonis a JSON array of RGB triples; invalid or missing data falls back to built-in defaults.config.jsonis optional and only overridestimer_interval_msandannotate_wait_key_mswithin bounded ranges..yolo_training_cache/is disposable output and is recreated by the training tool.
- File-based helper functions usually fall back to defaults instead of raising user-visible exceptions.
- GUI problems are surfaced through status labels, message boxes, and the training log view.
- Camera and GPU support are best-effort and depend on local drivers/runtime availability.
- The inference app fails early on missing Torch/Ultralytics imports and shows a blocking error dialog.
- The training app logs a clear message if the
yoloCLI cannot be resolved or started.
- Windows desktop environment with camera access
- Python runtime compatible with the current scripts; the README currently recommends Python 3.12
opencv-pythonPyQt5ultralyticstorchpsutil- optional
pynvml - optional
nvidia-smi - optional
pyinstaller
- Unit tests should cover deterministic helpers such as config parsing, class/color loading, label read/write helpers, and ETA formatting.
- Implementation tests should cover dataset-cache creation and warning behavior without requiring real cameras or GPUs.
- Hygiene tests should enforce required repo files, deviations-file rules, ignore rules, and footer policy.
- Captures created through
main.pyshould end as labeled data, a null capture undercaptures/null/, or be deleted when labeling is cancelled. - The training workflow only uses images that already have adjacent label files.
- Shared state between tools lives on disk, not in a service or database.
- The three top-level scripts are the current integration boundaries; cross-tool coupling happens through files and conventions rather than shared modules.