Benchmarking against PyTorch & jit.compile

### Discussed in https://github.com/dotnet/TorchSharp/discussions/1126

<div type='discussions-op-text'>

<sup>Originally posted by **pkese** October 28, 2023</sup>
If anyone is interested...

I made a small language model inspired by https://github.com/karpathy/nanoGPT in both PyTorch and TorchSharp.  
The model has 2 layers of transformers totalling 150k parameters and is trained on Shakespeare's text.

I found out that going to smaller data types, improves training time, as does PyTorch's `jit.compile`, which is not available in TorchSharp.   

Here are some benchmarks of model training times (minutes and seconds) with CUDA on a small GPU (RTX 3070).

|                     | default | tf32 | bf16 |
| ------------------- | ----- | ---- | ---- |
| TorchSharp 0.100.7  | 6:46  | 5:20 | N/A  |
| PyTorch 2.0.1       | 5:31  | 5:27 | 4:28 |
| PyTorch+jit.compile | 4:04  | 3:57 | 2:26 |


For `bf16` I used:
```python
from torch.cuda.amp import autocast
with autocast(dtype=torch.bfloat16):
    <train code>
```
I couldn't achieve the same `bf16` functionality with TorchSharp.

I don't quite understand why default TorchSharp code is slower than default PyTorch code.  
After I set `torch.backends.cuda.matmul.allow_tf32 = true` in both Python and TorchSharp, I get comparable performance (see first vs second column of results).

If someone is interested I can publish the code.  
(I was trying to also get TorchScript models to work on both sides which messed up the code quite a bit ... and I might wish to reverse that.)
BTW, TorchScript model was 1% slower to train on PyTorch and crashed in TorchSharp.
 </div>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking against PyTorch & jit.compile #1136

Discussed in #1126

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

	default	tf32	bf16
TorchSharp 0.100.7	6:46	5:20	N/A
PyTorch 2.0.1	5:31	5:27	4:28
PyTorch+jit.compile	4:04	3:57	2:26

Benchmarking against PyTorch & jit.compile #1136

Description

Discussed in #1126

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions