Skip to content

Commit de79986

Browse files
authored
Code refactoring (#3)
Refactor codebase for improved structure and readability - Organize utility functions into dedicated modules. - Separate benchmarking and prediction logic into distinct directories. - Streamline model initialization for CUDA, ONNX, and other environments. - Enhance benchmark visualization and address Seaborn deprecation warnings. - Improve error handling and logging for better debugging.
1 parent a9d16c7 commit de79986

12 files changed

Lines changed: 465 additions & 422 deletions

File tree

README.md

Lines changed: 21 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -6,20 +6,21 @@
66
2. [Requirements](#requirements)
77
- [Steps to Run](#steps-to-run)
88
- [Example Command](#example-command)
9-
5. [RESULTS](#results) ![Static Badge](https://img.shields.io/badge/update-orange)
9+
3. [RESULTS](#results) ![Static Badge](https://img.shields.io/badge/update-orange)
1010
- [Results explanation](#results-explanation)
1111
- [Example Input](#example-input)
12-
6. [Benchmark Implementation Details](#benchmark-implementation-details) ![New](https://img.shields.io/badge/-New-842E5B)
12+
- [Example prediction results](#example-prediction-results)
13+
4. [Benchmark Implementation Details](#benchmark-implementation-details) ![New](https://img.shields.io/badge/-New-842E5B)
1314
- [PyTorch CPU & CUDA](#pytorch-cpu--cuda)
1415
- [TensorRT FP32 & FP16](#tensorrt-fp32--fp16)
1516
- [ONNX](#onnx)
1617
- [OpenVINO](#openvino)
17-
7. [Used methodologies](#used-methodologies) ![New](https://img.shields.io/badge/-New-96E5FE)
18+
5. [Benchmarking and Visualization](#benchmarking-and-visualization) ![New](https://img.shields.io/badge/-New-96E5FE)
1819
- [TensorRT Optimization](#tensorrt-optimization)
1920
- [ONNX Exporter](#onnx-exporter)
2021
- [OV Exporter](#ov-exporter)
21-
10. [Author](#author)
22-
11. [References](#references)
22+
6. [Author](#author)
23+
7. [References](#references)
2324

2425

2526
<img src="./inference/plot.png" width="100%">
@@ -44,20 +45,20 @@ docker build -t awesome-tensorrt
4445
docker run --gpus all --rm -it awesome-tensorrt
4546

4647
# 3. Run the Script inside the Container
47-
python src/main.py
48+
python main.py [--mode all]
4849
```
4950

5051
### Arguments
5152
- `--image_path`: (Optional) Specifies the path to the image you want to predict.
5253
- `--topk`: (Optional) Specifies the number of top predictions to show. Defaults to 5 if not provided.
53-
- `--mode`: Specifies the mode for exporting and running the model. Choices are: `onnx`, `ov`, `all`.
54+
- `--mode`: (Optional) Specifies the mode for exporting and running the model. Choices are: `onnx`, `ov`, `all`. If not provided, it defaults to `all`.
5455

5556
### Example Command
5657
```sh
57-
python src/main.py --topk 3 --mode=all
58+
python main.py --topk 3 --mode=ov
5859
```
5960

60-
This command will run predictions on the default image (`./inference/cat3.jpg`), show the top 3 predictions, and run all models (PyTorch CPU, CUDA, ONNX, OV, TRT-FP16, TRT-FP32). At the end results plot will be saved to `./inference/plot.png`
61+
This command will run predictions on the default image (`./inference/cat3.jpg`), show the top 3 predictions, and run OpenVINO model. Note: plot created only for `--mode=all` and results plotted and saved to `./inference/plot.png`
6162

6263
## RESULTS
6364
### Inference Benchmark Results
@@ -76,6 +77,15 @@ Here is an example of the input image to run predictions and benchmarks on:
7677

7778
<img src="./inference/cat3.jpg" width="20%">
7879

80+
### Example prediction results
81+
```
82+
#1: 15% Egyptian cat
83+
#2: 14% tiger cat
84+
#3: 9% tabby
85+
#4: 2% doormat
86+
#5: 2% lynx
87+
```
88+
7989
## Benchmark Implementation Details
8090
Here you can see the flow for each model and benchmark.
8191

@@ -116,62 +126,8 @@ OpenVINO is a toolkit from Intel that optimizes deep learning model inference fo
116126
4. Perform inference on the provided image using the OpenVINO model.
117127
5. Benchmark results, including average inference time, are logged for the OpenVINO model.
118128

119-
## Used methodologies
120-
### TensorRT Optimization
121-
TensorRT is a high-performance deep learning inference optimizer and runtime library developed by NVIDIA. It is designed for optimizing and deploying trained neural network models on production environments. This project supports TensorRT optimizations in FP32 (single precision) and FP16 (half precision) modes, offering different trade-offs between inference speed and model accuracy.
122-
123-
#### Features
124-
- **Performance Boost**: TensorRT can significantly accelerate the inference of neural network models, making it suitable for deployment in resource-constrained environments.
125-
- **Precision Modes**: Supports FP32 for maximum accuracy and FP16 for faster performance with a minor trade-off in accuracy.
126-
- **Layer Fusion**: TensorRT fuses layers and tensors in the neural network to reduce memory access overhead and improve execution speed.
127-
- **Dynamic Tensor Memory**: Efficiently handles varying batch sizes without re-optimization.
128-
129-
#### Usage
130-
When running the main script, use the'- mode all' argument to employ TensorRT optimizations in the project.
131-
This will initiate all models, including PyTorch models, that will be compiled to the TRT model with `FP16` and `FP32` precision modes. Then, in one of the steps, we will run inference on the specified image using the TensorRT-optimized model.
132-
Example:
133-
```sh
134-
python src/main.py --mode all
135-
```
136-
#### Requirements
137-
Ensure you have the TensorRT library and the torch_tensorrt package installed in your environment. Also, for FP16 optimizations, it's recommended to have a GPU that supports half-precision arithmetic (like NVIDIA GPUs with Tensor Cores).
138-
139-
### ONNX Exporter
140-
ONNX Model Exporter (`ONNXExporter`) utility is incorporated within this project to enable converting the native PyTorch model into the ONNX format.
141-
Using the ONNX format, inference and benchmarking can be performed with the ONNX Runtime, which offers platform-agnostic optimizations and is widely supported across numerous platforms and devices.
142-
143-
#### Features
144-
- **Standardized Format**: ONNX provides an open-source format for AI models. It defines an extensible computation graph model and definitions of built-in operators and standard data types.
145-
- **Interoperability**: Models in ONNX format can be used across various frameworks, tools, runtimes, and compilers.
146-
- **Optimizations**: The ONNX Runtime provides performance optimizations for both cloud and edge devices.
147-
148-
#### Usage
149-
To leverage the `ONNXExporter` and conduct inference using the ONNX Runtime, utilize the `--mode onnx` argument when executing the main script.
150-
This will initiate the conversion process and then run inference on the specified image using the ONNX model.
151-
Example:
152-
```sh
153-
python src/main.py --mode onnx
154-
```
155-
156-
#### Requirements
157-
Ensure the ONNX library is installed in your environment to use the ONNXExporter. Additionally, if you want to run inference using the ONNX model, install the ONNX Runtime.
158-
159-
### OV Exporter
160-
OpenVINO Model Exporter utility (`OVExporter`) has been integrated into this project to facilitate the conversion of the ONNX model to the OpenVINO format.
161-
This enables inference and benchmarking using OpenVINO, a framework optimized for Intel hardware, providing substantial speed improvements, especially on CPUs.
162-
163-
#### Features
164-
- **Model Optimization**: Converts the ONNX model to OpenVINO's Intermediate Representation (IR) format. This optimized format allows for faster inference times on Intel hardware.
165-
- **Versatility**: OpenVINO can target various Intel hardware devices such as CPUs, integrated GPUs, FPGAs, and VPUs.
166-
- **Ease of Use**: The `OVExporter` seamlessly transitions from ONNX to OpenVINO, abstracting the conversion details and providing a straightforward interface.
167-
168-
#### Usage
169-
To utilize `OVExporter` and perform inference using OpenVINO, use the `--mode ov` argument when running the main script.
170-
This will trigger the conversion process and subsequently run inference on the provided image using the optimized OpenVINO model.
171-
Example:
172-
```sh
173-
python src/main.py --mode ov
174-
```
129+
## Benchmarking and Visualization
130+
The results of the benchmarks for all modes are saved and visualized in a bar chart, showcasing the average inference times across different backends. The visualization aids in comparing the performance gains achieved with different optimizations.
175131

176132
#### Requirements
177133
Ensure you have installed the OpenVINO Toolkit and the necessary dependencies to use OpenVINO's model optimizer and inference engine.

benchmark/__init__.py

Whitespace-only changes.

benchmark/benchmark_models.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
import src.benchmark_class
2+
from benchmark.benchmark_utils import run_benchmark
3+
from src.benchmark_class import PyTorchBenchmark, ONNXBenchmark, OVBenchmark
4+
import openvino as ov
5+
import torch
6+
import onnxruntime as ort
7+
8+
9+
def benchmark_onnx_model(ort_session: ort.InferenceSession):
10+
run_benchmark(None, None, None, ort_session, onnx=True)
11+
12+
13+
def benchmark_ov_model(ov_model: ov.CompiledModel) -> src.benchmark_class.OVBenchmark:
14+
ov_benchmark = OVBenchmark(ov_model, input_shape=(1, 3, 224, 224))
15+
ov_benchmark.run()
16+
return ov_benchmark
17+
18+
19+
def benchmark_cuda_model(cuda_model: torch.nn.Module, device: str, dtype: torch.dtype):
20+
run_benchmark(cuda_model, device, dtype)

benchmark/benchmark_utils.py

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
import logging
2+
3+
import numpy as np
4+
import pandas as pd
5+
import matplotlib.pyplot as plt
6+
import seaborn as sns
7+
from typing import Dict, Any
8+
import torch
9+
import onnxruntime as ort
10+
11+
from src.benchmark_class import PyTorchBenchmark, ONNXBenchmark, OVBenchmark
12+
13+
14+
def run_benchmark(
15+
model: torch.nn.Module,
16+
device: str,
17+
dtype: torch.dtype,
18+
ort_session: ort.InferenceSession = None,
19+
onnx: bool = False,
20+
) -> None:
21+
"""
22+
Run and log the benchmark for the given model, device, and dtype.
23+
24+
:param onnx:
25+
:param ort_session:
26+
:param model: The model to be benchmarked.
27+
:param device: The device to run the benchmark on ("cpu" or "cuda").
28+
:param dtype: The data type to be used in the benchmark (typically torch.float32 or torch.float16).
29+
"""
30+
if onnx:
31+
logging.info(f"Running Benchmark for ONNX")
32+
benchmark = ONNXBenchmark(ort_session, input_shape=(32, 3, 224, 224))
33+
else:
34+
logging.info(f"Running Benchmark for {device.upper()} and precision {dtype}")
35+
benchmark = PyTorchBenchmark(model, device=device, dtype=dtype)
36+
benchmark.run()
37+
38+
39+
def run_all_benchmarks(
40+
models: Dict[str, Any], img_batch: np.ndarray
41+
) -> Dict[str, float]:
42+
"""
43+
Run benchmarks for all models and return a dictionary of average inference times.
44+
45+
:param models: Dictionary of models. Key is model type ("onnx", "ov", "pytorch", "trt_fp32", "trt_fp16"), value is the model.
46+
:param img_batch: The batch of images to run the benchmark on.
47+
:return: Dictionary of average inference times. Key is model type, value is average inference time.
48+
"""
49+
results = {}
50+
51+
# ONNX benchmark
52+
logging.info(f"Running benchmark inference for ONNX model")
53+
onnx_benchmark = ONNXBenchmark(models["onnx"], img_batch.shape)
54+
avg_time_onnx = onnx_benchmark.run()
55+
results["ONNX"] = avg_time_onnx
56+
57+
# OpenVINO benchmark
58+
logging.info(f"Running benchmark inference for OpenVINO model")
59+
ov_benchmark = OVBenchmark(models["ov"], img_batch.shape)
60+
avg_time_ov = ov_benchmark.run()
61+
results["OpenVINO"] = avg_time_ov
62+
63+
# PyTorch + TRT benchmark
64+
configs = [
65+
("cpu", torch.float32, False),
66+
("cuda", torch.float32, False),
67+
("cuda", torch.float32, True),
68+
("cuda", torch.float16, True),
69+
]
70+
for device, precision, is_trt in configs:
71+
model_to_use = models[f"PyTorch_{device}"].to(device)
72+
73+
if not is_trt:
74+
pytorch_benchmark = PyTorchBenchmark(
75+
model_to_use, device=device, dtype=precision
76+
)
77+
logging.info(f"Running benchmark inference for PyTorch_{device} model")
78+
avg_time_pytorch = pytorch_benchmark.run()
79+
results[f"PyTorch_{device}"] = avg_time_pytorch
80+
81+
else:
82+
# TensorRT benchmarks
83+
if precision == torch.float32 or precision == torch.float16:
84+
mode = "fp32" if precision == torch.float32 else "fp16"
85+
logging.info(f"Running benchmark inference for TRT_{mode} model")
86+
trt_benchmark = PyTorchBenchmark(
87+
models[f"trt_{mode}"], device=device, dtype=precision
88+
)
89+
avg_time_trt = trt_benchmark.run()
90+
results[f"TRT_{mode}"] = avg_time_trt
91+
92+
return results
93+
94+
95+
def plot_benchmark_results(results: Dict[str, float]):
96+
"""
97+
Plot the benchmark results using Seaborn.
98+
99+
:param results: Dictionary of average inference times. Key is model type, value is average inference time.
100+
"""
101+
# Convert dictionary to two lists for plotting
102+
models = list(results.keys())
103+
times = list(results.values())
104+
105+
# Create a DataFrame for plotting
106+
data = pd.DataFrame({"Model": models, "Time": times})
107+
108+
# Sort the DataFrame by Time
109+
data = data.sort_values("Time", ascending=True)
110+
111+
# Plot
112+
plt.figure(figsize=(10, 6))
113+
ax = sns.barplot(x=data["Time"], y=data["Model"], hue=data["Model"], palette="rocket", legend=False)
114+
115+
# Adding the actual values on the bars
116+
for index, value in enumerate(data["Time"]):
117+
ax.text(value, index, f"{value:.2f} ms", color="black", ha="left", va="center")
118+
119+
plt.xlabel("Average Inference Time (ms)")
120+
plt.ylabel("Model Type")
121+
plt.title("ResNet50 - Inference Benchmark Results")
122+
123+
# Save the plot to a file
124+
plt.savefig("./inference/plot.png", bbox_inches="tight")
125+
plt.show()

common/__init__.py

Whitespace-only changes.

common/utils.py

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
import argparse
2+
import openvino as ov
3+
import torch
4+
from src.model import ModelLoader
5+
from src.onnx_exporter import ONNXExporter
6+
from src.ov_exporter import OVExporter
7+
import onnxruntime as ort
8+
9+
10+
def export_onnx_model(
11+
onnx_path: str, model_loader: ModelLoader, device: torch.device
12+
) -> None:
13+
onnx_exporter = ONNXExporter(model_loader.model, device, onnx_path)
14+
onnx_exporter.export_model()
15+
16+
17+
def init_onnx_model(
18+
onnx_path: str, model_loader: ModelLoader, device: torch.device
19+
) -> ort.InferenceSession:
20+
export_onnx_model(onnx_path=onnx_path, model_loader=model_loader, device=device)
21+
return ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
22+
23+
24+
def init_ov_model(onnx_path: str) -> ov.CompiledModel:
25+
ov_exporter = OVExporter(onnx_path)
26+
return ov_exporter.export_model()
27+
28+
29+
def init_cuda_model(
30+
model_loader: ModelLoader, device: torch.device, dtype: torch.dtype
31+
) -> torch.nn.Module:
32+
cuda_model = model_loader.model.to(device)
33+
if device == "cuda":
34+
cuda_model = torch.jit.trace(
35+
cuda_model, [torch.randn((1, 3, 224, 224)).to(device)]
36+
)
37+
return cuda_model
38+
39+
40+
def parse_arguments():
41+
# Initialize ArgumentParser with description
42+
parser = argparse.ArgumentParser(description="PyTorch Inference")
43+
parser.add_argument(
44+
"--image_path",
45+
type=str,
46+
default="./inference/cat3.jpg",
47+
help="Path to the image to predict",
48+
)
49+
parser.add_argument(
50+
"--topk", type=int, default=5, help="Number of top predictions to show"
51+
)
52+
parser.add_argument(
53+
"--onnx_path",
54+
type=str,
55+
default="./inference/model.onnx",
56+
help="Path where model in ONNX format will be exported",
57+
)
58+
parser.add_argument(
59+
"--mode",
60+
choices=["onnx", "ov", "cuda", "all"],
61+
default="all",
62+
help="Mode for exporting and running the model. Choices are: onnx, ov, cuda or all.",
63+
)
64+
65+
return parser.parse_args()

0 commit comments

Comments
 (0)