Skip to content

Commit 1e4eb06

Browse files
committed
[WIP] Basic README on matrices.
1 parent fa50e9d commit 1e4eb06

2 files changed

Lines changed: 29 additions & 19 deletions

File tree

README.md

Lines changed: 8 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
1-
# ImageProcessing
1+
# Brahma.FSharp examples
22

3-
Simple image processing on GPGPU in F# using [Brahma.FSharp](https://github.com/YaccConstructor/Brahma.FSharp).
3+
Few example how to utilize GPGPU in F# code using [Brahma.FSharp](https://github.com/YaccConstructor/Brahma.FSharp).
4+
5+
- [Image processing](src/ImageProcessing)
6+
- [Matrix multiplication](src/MatrixMultiplication)
47

58
---
69

@@ -12,20 +15,13 @@ GitHub Actions |
1215
[![GitHub Actions](https://github.com/gsvgit/ImageProcessing/workflows/Build%20master/badge.svg)](https://github.com/gsvgit/ImageProcessing/actions?query=branch%3Amaster) |
1316
[![Build History](https://buildstats.info/github/chart/gsvgit/ImageProcessing)](https://github.com/gsvgit/ImageProcessing/actions?query=branch%3Amaster) |
1417

15-
## NuGet
16-
17-
Package | Stable | Prerelease
18-
--- | --- | ---
19-
ImageProcessing | |
20-
21-
2218
---
2319

2420
### Developing
2521

2622
Make sure the following **requirements** are installed on your system:
2723

28-
- [dotnet SDK](https://dotnet.microsoft.com/en-us/download/dotnet/7.0) 7.0 or higher
24+
- [dotnet SDK 9.0](https://dotnet.microsoft.com/en-us/download/dotnet/9.0) or higher
2925
- OpenCL-compatible device with respective driver installed.
3026

3127
---
@@ -34,12 +30,5 @@ Make sure the following **requirements** are installed on your system:
3430

3531

3632
```sh
37-
> build.cmd <optional buildtarget> // on windows
38-
$ ./build.sh <optional buildtarget>// on unix
39-
```
40-
41-
---
42-
43-
### Build Targets
44-
45-
For details look at [MiniScaffold](https://github.com/TheAngryByrd/MiniScaffold), we use it in our project.
33+
dotnet build -c Release
34+
```

src/MatrixMultiplication/README.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
## Matrix multiplication optimization step by step
2+
3+
A sequence of matrix multiplication optimizations that inspired by [OpenCL SGEMM tutorial by Cedric Nugteren](https://cnugteren.github.io/tutorial/pages/page1.html).
4+
Our goal is to show basic optimizations, so we omit some steps represented in the tutorial.
5+
All kernels compute **C = A * B** and are parametrized by element type and element-wise operations.
6+
7+
**kernel0 (K0)** is a naive kernel that accumulates results directly in **C**.
8+
9+
In **kernel1 (K1)** each thread uses register to accumulate **C[i,j]** and writes this value to **C** at the end of computations.
10+
Thus we reduce global memory IO.
11+
This kernel reproduces [naive implementation form the tutorial](https://cnugteren.github.io/tutorial/pages/page3.html).
12+
13+
**kernel2 (K2)** utilizes local memory to store tiles of matrices. The idea is based on [block matrix multiplication](https://en.wikipedia.org/wiki/Block_matrix#Multiplication).
14+
Respective kernel from te tutorial is a [kernel 2](https://cnugteren.github.io/tutorial/pages/page4.html).
15+
16+
**kernel3 (K3)** implicitly reduce data transfer between local memory and registers by computations grouping.
17+
Respective kernel from te tutorial is a [kernel 3](https://cnugteren.github.io/tutorial/pages/page5.html).
18+
19+
**kernel4 (K4)** is designed to use register aggressively to allocates tiles of matrices.
20+
Thus we try to reduce data local memory and registers even more.
21+
Respective kernel from te tutorial is a [kernel 6](https://cnugteren.github.io/tutorial/pages/page8.html).

0 commit comments

Comments
 (0)