Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
143 commits
Select commit Hold shift + click to select a range
baf3cc6
model : clarify MTP layer comment in qwen35.cpp [no ci] (#23338)
danbev May 19, 2026
ac76808
hexagon: enable support for NORM op (#23319)
aparmp-quic May 19, 2026
b7393a4
convert : update mtp related help (#23334)
CISC May 19, 2026
7256fce
common: fix --fit verbosity with --verbosity 4 (#23282)
JohannesGaessler May 19, 2026
57cb35c
common: fix --help for --verbosity (#23278)
JohannesGaessler May 19, 2026
a807867
github: mention --log-file in issue templates (#23277)
JohannesGaessler May 19, 2026
67ace02
refactor: Chat Screen UI rendering (#23333)
allozaur May 19, 2026
17d22a3
hexagon: add MROPE and IMROPE support in HTP rope op (#23317)
aparmp-quic May 19, 2026
b28a2f3
opencl: add MoE support for q4_k, q5_k, q6_k on Adreno (#23303)
shaofeiqi May 19, 2026
b39a7bf
ggml-cuda: tune RDNA3 Q6_K MMVQ nwarps (#23349)
ravel7524 May 20, 2026
871b0b7
snapdragon: update toolchain to v0.6 (#23369)
max-krasnyansky May 20, 2026
57ebaf4
metal : optimize pad + cpy (#23354)
ggerganov May 20, 2026
585080d
fix: Div wrapper no pointer events on hidden (#23390)
allozaur May 20, 2026
5028447
ui: Refactor `isMobile` as reactive value in `viewport` store (#23330)
allozaur May 20, 2026
7e50ef7
docker : copy conversion files (#23370)
CISC May 20, 2026
e2b129e
mtmd: fit_params now take into account mmproj (#21489)
ngxson May 20, 2026
e6b4acf
refactor: Move text attachments up before the message content in chat…
allozaur May 20, 2026
29f1482
app : introduce the llama unified executable (#23296)
angt May 20, 2026
e947228
Programmatic Dependent Launch (PDL) for more performance on newer NVI…
aendk May 20, 2026
c9872a2
hexagon: HMX quantized matmul rework (#23368)
max-krasnyansky May 20, 2026
6ce9671
feat: Add WAV MIME type variants and improve audio format detection (…
allozaur May 20, 2026
acd604f
vulkan: optimize operations in the IM2COL shader (#22685)
daniandtheweb May 20, 2026
a8681a0
mtmd : DeepSeek-OCR image processing fixes, img_tool::resize padding …
sfallah May 20, 2026
510b5c2
common/speculative : fix nullptr crash in get_devices_str (#23386)
ggerganov May 20, 2026
3a6db74
opencl: refactor backend initilization (#23318)
lhez May 20, 2026
ad27757
Move to backend sampling for MTP draft path (#23287)
gaugarg-nv May 20, 2026
3a479c9
ui: Add max image size option (#22849)
stduhpf May 20, 2026
6a257d4
mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision prec…
wendadawen May 20, 2026
ce02093
app : show version (#23426)
angt May 21, 2026
0be8468
hexagon: ssm-conv fix for large prompts (#23307)
tboinovski1 May 21, 2026
eeeaf61
llama-graph: fix null-buffer crash in llm_graph_input_attn_kv_iswa fo…
ssfdre38 May 21, 2026
2754ce1
ggml : Check the right iface method before using the fallback 2d get …
TheBlueMatt May 21, 2026
5e932a1
ui: Improve Git Hooks for UI development (#23403)
allozaur May 21, 2026
2fc8d18
doc: fix spec mtp typo (#23435)
ruixiang63 May 21, 2026
7ea23dd
vocab : add Carbon-3B (HybridDNATokenizer) support (#23410)
kashif May 21, 2026
12e5d99
mtp: use inp_out_ids for skipping logit computation (#23433)
am17an May 21, 2026
1d7ab2b
app : add batched-bench, fit-params, quantize & perplexity (#23459)
angt May 21, 2026
c902171
server: re-inject subcommand when router spawns children under unifie…
ServeurpersoCom May 21, 2026
52fb93a
server : free draft/MTP resources on sleep to fix VRAM leak (#23461)
am17an May 21, 2026
a1a69f7
metal : optimize concat kernel and fix set kernel threads (#23411)
ggerganov May 21, 2026
b65bb4b
server: expose prompt token counts in /slots endpoint (#23454)
ScrewTSW May 21, 2026
40d5358
tests : move save-load-state from examples to tests (#23336)
ggerganov May 21, 2026
5306f4b
fix(flash-attn): replace f32 with kv_type and q_type (#23372)
Constannnnnt May 21, 2026
47c0eda
vulkan: fuse snake activation (mul, sin, sqr, mul, add) (#22855)
ServeurpersoCom May 21, 2026
ee7c305
Update WebGPU support and add link to blog/demo (#23483)
reeselevine May 21, 2026
bb28c1f
cmake : remove STATIC from impl libraries, enable LLAMA_BUILD_APP by …
ggerganov May 21, 2026
4f0e43d
CUDA: fix PDL CC check for JIT compilation (#23471)
JohannesGaessler May 21, 2026
bbce619
cmake : add install() for impl libraries + fix apple builds (#23511)
ggerganov May 22, 2026
afcda09
vocab : fix HybridDNA tokenizer (#23466)
kashif May 22, 2026
9c92e96
cmake : build router app only during standalone builds (#23521)
fairydreaming May 22, 2026
99d4026
ggml-zendnn : add Q8_0 quantization support (#23414)
z-sachin May 22, 2026
95feeab
docs: Update documentation with Granite 4.0/4.1 (#23404)
jesus-talavera-ibm May 22, 2026
8cc67ef
SYCL: add BF16 to DMMV kernel path (~4x tg speedup on Intel Arc) (#21…
PMZFX May 22, 2026
56f16f2
SYCL : gated_delta_net K>1 (#23174)
karavayev May 22, 2026
bcfd198
sycl : Level Zero detection in ggml_sycl_init (#23097)
sanmai May 22, 2026
cc9e331
SYCL: improve MoE prefill throughput (#23142)
sanmai May 22, 2026
ef570f6
perplexity : fix integer overflow (#23496)
fairydreaming May 22, 2026
1acee6b
server: only parse empty msg if continuing an assistant msg (#23506)
aldehir May 22, 2026
0f3cb3f
opencl: generalize Adreno MoE kernels on M (#23449)
shawngu-quic May 23, 2026
95405ac
vulkan: fix windows find_package of SPIRV-Headers (#23215)
jeffbolznv May 23, 2026
a497476
ggml : Check the right iface method before using the fallback 2d get …
dskwe May 23, 2026
b0df4c0
model : add NVFP4 MTP scale tensors (#23563)
michaelw9999 May 23, 2026
c0c7e14
requirements : bump torch to 2.11.0 (#23503)
adityasingh2400 May 23, 2026
b22ff4b
cmake/ui : refactor the build (#23352)
aldehir May 23, 2026
cec51c7
snapdragon: update windows toolchain to use hsdk v6.6.0.0 (#23552)
aparmp-quic May 24, 2026
1c0f6db
hexagon: apply repl optimization in flash attn softmax as #22993 (#23…
njsyw1997 May 24, 2026
f306111
opencl: batch profiling to improve speed and prevent memory leaks (#2…
shaofeiqi May 24, 2026
fff63b5
TP: fix entirely zero-sized slices per device (#23525)
JohannesGaessler May 24, 2026
83eebe9
server: add margin for draft model for `fit` (#23485)
am17an May 24, 2026
63248fc
cmake : fix ui build (#23592)
aldehir May 24, 2026
5d246a7
convert : minor fixes for numpy 2.x (#23571)
CISC May 24, 2026
549b9d8
ci : update build-self-hosted.yml (#23616)
ggerganov May 24, 2026
28123a3
ci : move most slim jobs to self-hosted runners (#23619)
ggerganov May 25, 2026
6d57c26
perplexity : fix even more integer overflows (#23623)
fairydreaming May 25, 2026
e2ef8fe
server: fix checkpoints creation (#22929)
jacekpoplawski May 25, 2026
9627d0f
vendor : update cpp-httplib to 0.45.1 (#23639)
cabelo May 25, 2026
b964876
ui: media attachments before text (#23467)
sfallah May 25, 2026
826539c
ggml : Parallelize quant LUT init (#23595)
jeffbolznv May 25, 2026
d55fb97
ci : install host compiler on android-ndk build (#23630)
aldehir May 25, 2026
314e729
llama : document that only one on-device state can be saved per seque…
TimNN May 25, 2026
062d311
ci : fix pre-tokenizer-hashes check (#23651)
CISC May 25, 2026
5fdf07e
ci : update spacemit toolchain url and enhance curl command (#23642)
alex-spacemit May 25, 2026
6c4cbdc
server: MTP layer kv-cache should respect draft type ctk (#23646)
am17an May 25, 2026
66efd13
ggml: `gguf_init_from_callback` and `gguf_init_from_buffer` (#22341)
giladgd May 25, 2026
ae251b5
TP: fix ggml context size calculation (#22616)
JohannesGaessler May 25, 2026
fa97041
ggml-alloc: fix out-of-bounds read in ggml_dyn_tallocr_remove_block (…
Dev-X25874 May 21, 2026
b251f74
ggml.h: correct ggml_silu_back arg docstring (a=dy, b=x) (ggml/1500)
OriPekelman May 21, 2026
ce5890b
ggml : bump version to 0.12.1 (ggml/1508)
ggerganov May 25, 2026
22307b3
sync : ggml
ggerganov May 25, 2026
45158f4
ggml : bump version to 0.13.0 (ggml/1510)
ggerganov May 25, 2026
d161ea7
sync : ggml
ggerganov May 25, 2026
a4d2d4a
convert : add compressed-tensors NVFP4 support (#21095)
michaelw9999 May 25, 2026
5a4126a
ui: fix stop/continue during an agentic loop (#23356)
ServeurpersoCom May 25, 2026
c1f1e28
CUDA: add fast walsh-hadamard transform (#23615)
am17an May 25, 2026
328874d
model: tag ffn_latent as MUL_MAT to fix buft probe (#23664)
ServeurpersoCom May 25, 2026
302e2c2
ci : reduce PR jobs by matching backend paths (#23675)
ggerganov May 25, 2026
4bead4e
snapdragon: bump toolchain docker to v0.7 to fix ui build issues (#23…
max-krasnyansky May 25, 2026
35c9b1f
metal : add apple device id (#23566)
forforever73 May 25, 2026
192d8ae
CUDA: missing PDL sync for FWHT, better fallback (#23690)
JohannesGaessler May 26, 2026
54121f7
[WebGPU] Check batch_compute_passes before sending passes when not do…
nikhilJain17 May 26, 2026
1506d39
ggml-webgpu: Add MMVQ path for Q4/Q8/Q2_K/Q4_K and clean up legacy MU…
yomaytk May 26, 2026
c9d9829
model : add support for talkie-1930-13b (#22596)
niklassheth May 26, 2026
7623de1
tests: test-backend-ops -j <N> to run tests in parallel (#23637)
jeffbolznv May 26, 2026
581d020
SYCL: implement ggml_sycl_pool_vmm (#22862)
sanmai May 26, 2026
6fe90de
models : Attach Mistral3 NVFP4 weight scales (#23629)
michaelw9999 May 26, 2026
dbe9c0c
convert : support Gemma4ForCausalLM architecture (#23682)
aoleg May 26, 2026
3dc7684
ci : reduce (disable SYCL and CANN builds/releases) (#23705)
ggerganov May 26, 2026
ef41a69
ci : move sanitizer jobs to self-hosted runners (#23713)
ggerganov May 26, 2026
678d43d
ci : move more CPU jobs to self-hosted runners (#23715)
ggerganov May 26, 2026
ef66bfa
hexagon: add support for CONCAT op (#23648)
max-krasnyansky May 26, 2026
3a3ed15
ci : remove vulkan SDK dep from webgpu job (#23718)
ggerganov May 26, 2026
7799d31
vulkan: optimize conv2d and implement coopmat1 support (#22620)
jeffbolznv May 26, 2026
5190c2e
ci : move macos jobs to the apple workflow + fix names (#23721)
ggerganov May 26, 2026
35a74c8
ci : add `[no release]` keyword + fix sanitizer builds (#23728)
ggerganov May 26, 2026
08bc21b
ci : move [no release] check to dedicated check_release job (#23734)
ggerganov May 26, 2026
0d18aaa
ci : do not allocate ccache for 3rd-party hosted runners (#23730)
ggerganov May 26, 2026
b4c0549
ggml-zendnn : fixed naming of matmul function (#20964)
truecoder34 May 26, 2026
7085492
server : fix the log message when using SSL (#23393)
rgerganov May 27, 2026
9777256
convert: add MiniCPM5 tokenizer support (#23384)
zhangtao2-1 May 27, 2026
1d971bb
docs : fix duplicated "the" in granitevision and model-conversion doc…
quyentonndbs May 27, 2026
0d227ec
ci : add ccache to server builds + fix undefined sanitizer build (#23…
ggerganov May 27, 2026
4d8cc0c
vulkan: avoid preferring transfer queue on AMD UMA devices (#22455)
winstonma May 27, 2026
b3a739c
ci : remove wasm test (#23733)
CISC May 27, 2026
9f0e4b1
ci : fix windows ccaches (#23777)
ggerganov May 27, 2026
6b4e4bd
common : fix env names to all have LLAMA_ARG_ prefix (#23778)
ggerganov May 27, 2026
2d0656f
ci : bump cuda release to 13.3 (#23749)
CISC May 27, 2026
fda8528
CUDA: restrict PDL to CTK >= 12.3 due to MSVC issues (#23742)
ORippler May 27, 2026
87b0a60
pyproject : add conversion folder and update dependencies (#23746)
CISC May 27, 2026
617255d
vendor : update cpp-httplib to 0.46.0 (#23650)
cabelo May 27, 2026
ba4dd0b
ci : move ARM jobs to self-hosted + disable kleidiai mac release (#23…
ggerganov May 27, 2026
837bb6b
vulkan: add REPEAT op support for f16 to f16. (#23298)
l8bloom May 27, 2026
b36eefc
vulkan: use GL_NV_cooperative_matrix_decode_vector for faster matmul …
jeffbolznv May 27, 2026
c6e4088
vulkan: Switch MUL_MAT_VEC to 4 K per iteration for F16/32 (#22887)
TheBlueMatt May 27, 2026
c40006a
ggml-webgpu: Fix how to dispatch WG to some ops (#23750)
yomaytk May 27, 2026
aa50b2c
hexagon: add support for Q4_1 in MUL_MAT and MUL_MAT_ID (#23647)
max-krasnyansky May 27, 2026
f12cc6d
ggml-webgpu: remove legacy constants (#23672)
reeselevine May 27, 2026
dfc02c9
llama: Gemma 4 MTP
am17an May 19, 2026
ee1ee38
fix multi-seq
am17an May 19, 2026
01134dd
add assert that draft + shared kv should be on same device
am17an May 20, 2026
1e4fb9f
add Q rot when cache is quantized
am17an May 21, 2026
c073320
add temp hack to not use fit with gemma4, rm later
am17an May 28, 2026
79098ee
add exception in test-llama-archs
am17an May 28, 2026
e21d64b
move assistant to separate file
am17an May 28, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions .devops/cann.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ RUN mkdir -p /app/lib && \
RUN mkdir -p /app/full && \
cp build/bin/* /app/full/ && \
cp *.py /app/full/ && \
cp -r conversion /app/full/ && \
cp -r gguf-py /app/full/ && \
cp -r requirements /app/full/ && \
cp requirements.txt /app/full/
Expand Down
1 change: 1 addition & 0 deletions .devops/cpu.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ RUN mkdir -p /app/lib && \
RUN mkdir -p /app/full \
&& cp build/bin/* /app/full \
&& cp *.py /app/full \
&& cp -r conversion /app/full \
&& cp -r gguf-py /app/full \
&& cp -r requirements /app/full \
&& cp requirements.txt /app/full \
Expand Down
1 change: 1 addition & 0 deletions .devops/cuda.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ RUN mkdir -p /app/lib && \
RUN mkdir -p /app/full \
&& cp build/bin/* /app/full \
&& cp *.py /app/full \
&& cp -r conversion /app/full \
&& cp -r gguf-py /app/full \
&& cp -r requirements /app/full \
&& cp requirements.txt /app/full \
Expand Down
1 change: 1 addition & 0 deletions .devops/intel.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ RUN mkdir -p /app/lib && \
RUN mkdir -p /app/full \
&& cp build/bin/* /app/full \
&& cp *.py /app/full \
&& cp -r conversion /app/full \
&& cp -r gguf-py /app/full \
&& cp -r requirements /app/full \
&& cp requirements.txt /app/full \
Expand Down
1 change: 1 addition & 0 deletions .devops/musa.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ RUN mkdir -p /app/lib && \
RUN mkdir -p /app/full \
&& cp build/bin/* /app/full \
&& cp *.py /app/full \
&& cp -r conversion /app/full \
&& cp -r gguf-py /app/full \
&& cp -r requirements /app/full \
&& cp requirements.txt /app/full \
Expand Down
1 change: 1 addition & 0 deletions .devops/openvino.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ RUN mkdir -p /app/lib && \
RUN mkdir -p /app/full \
&& cp build/ReleaseOV/bin/* /app/full/ \
&& cp *.py /app/full \
&& cp -r conversion /app/full \
&& cp -r gguf-py /app/full \
&& cp -r requirements /app/full \
&& cp requirements.txt /app/full \
Expand Down
1 change: 1 addition & 0 deletions .devops/rocm.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ RUN mkdir -p /app/lib \
RUN mkdir -p /app/full \
&& cp build/bin/* /app/full \
&& cp *.py /app/full \
&& cp -r conversion /app/full \
&& cp -r gguf-py /app/full \
&& cp -r requirements /app/full \
&& cp requirements.txt /app/full \
Expand Down
9 changes: 6 additions & 3 deletions .devops/s390x.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ RUN --mount=type=cache,target=/root/.ccache \

COPY *.py /opt/llama.cpp/bin
COPY .devops/tools.sh /opt/llama.cpp/bin
COPY conversion /opt/llama.cpp/conversion

COPY gguf-py /opt/llama.cpp/gguf-py
COPY requirements.txt /opt/llama.cpp/gguf-py
Expand All @@ -47,9 +48,10 @@ COPY requirements /opt/llama.cpp/gguf-py/requirements
FROM scratch AS collector

# Copy llama.cpp binaries and libraries
COPY --from=build /opt/llama.cpp/bin /llama.cpp/bin
COPY --from=build /opt/llama.cpp/lib /llama.cpp/lib
COPY --from=build /opt/llama.cpp/gguf-py /llama.cpp/gguf-py
COPY --from=build /opt/llama.cpp/bin /llama.cpp/bin
COPY --from=build /opt/llama.cpp/lib /llama.cpp/lib
COPY --from=build /opt/llama.cpp/gguf-py /llama.cpp/gguf-py
COPY --from=build /opt/llama.cpp/conversion /llama.cpp/conversion


### Base image
Expand Down Expand Up @@ -107,6 +109,7 @@ RUN curl https://sh.rustup.rs -sSf | bash -s -- -y

COPY --from=collector /llama.cpp/bin /app
COPY --from=collector /llama.cpp/gguf-py /app/gguf-py
COPY --from=collector /llama.cpp/conversion /app/conversion

RUN pip install --no-cache-dir --break-system-packages \
-r /app/gguf-py/requirements.txt
Expand Down
1 change: 1 addition & 0 deletions .devops/vulkan.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ RUN mkdir -p /app/lib && \
RUN mkdir -p /app/full \
&& cp build/bin/* /app/full \
&& cp *.py /app/full \
&& cp -r conversion /app/full \
&& cp -r gguf-py /app/full \
&& cp -r requirements /app/full \
&& cp requirements.txt /app/full \
Expand Down
4 changes: 2 additions & 2 deletions .github/ISSUE_TEMPLATE/011-bug-results.yml
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,8 @@ body:
label: Relevant log output
description: >
Please copy and paste any relevant log output, including the command that you entered and any generated text.
For very long logs (thousands of lines), preferably upload them as files instead.
On Linux you can redirect console output into a file by appending ` > llama.log 2>&1` to your command.
For very long logs (thousands of lines), please upload them as files instead; the `--log-file` CLI argument can be used for this purpose.
On Linux you can alternatively redirect the console output of any command into a file by appending ` > llama.log 2>&1` to your command.
value: |
<details>
<summary>Logs</summary>
Expand Down
4 changes: 2 additions & 2 deletions .github/ISSUE_TEMPLATE/019-bug-misc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,8 @@ body:
description: >
If applicable, please copy and paste any relevant log output, including any generated text.
If you are encountering problems specifically with the `llama_params_fit` module, always upload `--verbose` logs as well.
For very long logs (thousands of lines), please upload them as files instead.
On Linux you can redirect console output into a file by appending ` > llama.log 2>&1` to your command.
For very long logs (thousands of lines), please upload them as files instead; the `--log-file` CLI argument can be used for this purpose.
On Linux you can alternatively redirect the console output of any command into a file by appending ` > llama.log 2>&1` to your command.
value: |
<details>
<summary>Logs</summary>
Expand Down
2 changes: 1 addition & 1 deletion .github/actions/linux-setup-spacemit/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,6 @@ runs:
id: setup
uses: ./.github/actions/unarchive-tar
with:
url: https://archive.spacemit.com/toolchain/spacemit-toolchain-linux-glibc-x86_64-v${{ inputs.version }}.tar.xz
url: https://github.com/spacemit-com/toolchain/releases/download/v${{ inputs.version }}/spacemit-toolchain-linux-glibc-x86_64-v${{ inputs.version }}.tar.xz
path: ${{ inputs.path }}
strip: 1
2 changes: 1 addition & 1 deletion .github/actions/unarchive-tar/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,4 @@ runs:
run: |
mkdir -p ${{ inputs.path }}
cd ${{ inputs.path }}
curl --no-progress-meter ${{ inputs.url }} | tar -${{ inputs.type }}x --strip-components=${{ inputs.strip }}
curl --no-progress-meter -L ${{ inputs.url }} | tar -${{ inputs.type }}x --strip-components=${{ inputs.strip }}
31 changes: 31 additions & 0 deletions .github/actions/windows-setup-cuda/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -96,3 +96,34 @@ runs:
echo "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.1\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
echo "CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.1" | Out-File -FilePath $env:GITHUB_ENV -Append -Encoding utf8
echo "CUDA_PATH_V13_1=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.1" | Out-File -FilePath $env:GITHUB_ENV -Append -Encoding utf8

- name: Install Cuda Toolkit 13.3
if: ${{ inputs.cuda_version == '13.3' }}
shell: pwsh
run: |
mkdir -p "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3"
choco install unzip -y
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_crt/windows-x86_64/cuda_crt-windows-x86_64-13.3.33-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_cudart/windows-x86_64/cuda_cudart-windows-x86_64-13.3.29-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvcc/windows-x86_64/cuda_nvcc-windows-x86_64-13.3.33-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvrtc/windows-x86_64/cuda_nvrtc-windows-x86_64-13.3.33-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/libcublas/windows-x86_64/libcublas-windows-x86_64-13.5.1.27-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/libnvvm/windows-x86_64/libnvvm-windows-x86_64-13.3.33-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvtx/windows-x86_64/cuda_nvtx-windows-x86_64-13.3.29-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cuda_profiler_api/windows-x86_64/cuda_profiler_api-windows-x86_64-13.3.27-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/visual_studio_integration/windows-x86_64/visual_studio_integration-windows-x86_64-13.3.27-archive.zip"
curl -O "https://developer.download.nvidia.com/compute/cuda/redist/cccl/windows-x86_64/cccl-windows-x86_64-13.3.3.3.1-archive.zip"
unzip '*.zip' -d "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3"
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3\cuda_crt-windows-x86_64-13.3.33-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3\cuda_cudart-windows-x86_64-13.3.29-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3\cuda_nvcc-windows-x86_64-13.3.33-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3\cuda_nvrtc-windows-x86_64-13.3.33-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3\libcublas-windows-x86_64-13.5.1.27-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3\libnvvm-windows-x86_64-13.3.33-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3\cuda_nvtx-windows-x86_64-13.3.29-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3\cuda_profiler_api-windows-x86_64-13.3.27-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3\visual_studio_integration-windows-x86_64-13.3.27-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3" /E /I /H /Y
xcopy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3\cccl-windows-x86_64-13.3.3.3.1-archive\*" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3" /E /I /H /Y
echo "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
echo "CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3" | Out-File -FilePath $env:GITHUB_ENV -Append -Encoding utf8
echo "CUDA_PATH_V13_3=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.3" | Out-File -FilePath $env:GITHUB_ENV -Append -Encoding utf8
6 changes: 3 additions & 3 deletions .github/workflows/build-3rd-party.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ concurrency:
env:
GGML_NLOOP: 3
GGML_N_THREADS: 1
LLAMA_LOG_COLORS: 1
LLAMA_LOG_PREFIX: 1
LLAMA_LOG_TIMESTAMPS: 1
LLAMA_ARG_LOG_COLORS: 1
LLAMA_ARG_LOG_PREFIX: 1
LLAMA_ARG_LOG_TIMESTAMPS: 1

jobs:
ubuntu-24-llguidance:
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/build-and-test-snapdragon.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
android-ndk-snapdragon:
runs-on: ubuntu-latest
container:
image: 'ghcr.io/snapdragon-toolchain/arm64-android:v0.3'
image: 'ghcr.io/snapdragon-toolchain/arm64-android:v0.7'
defaults:
run:
shell: bash
Expand Down Expand Up @@ -61,7 +61,7 @@ jobs:
linux-iot-snapdragon:
runs-on: ubuntu-latest
container:
image: 'ghcr.io/snapdragon-toolchain/arm64-linux:v0.1'
image: 'ghcr.io/snapdragon-toolchain/arm64-linux:v0.7'
defaults:
run:
shell: bash
Expand Down
61 changes: 58 additions & 3 deletions .github/workflows/build-android.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,9 @@ concurrency:
env:
GGML_NLOOP: 3
GGML_N_THREADS: 1
LLAMA_LOG_COLORS: 1
LLAMA_LOG_PREFIX: 1
LLAMA_LOG_TIMESTAMPS: 1
LLAMA_ARG_LOG_COLORS: 1
LLAMA_ARG_LOG_PREFIX: 1
LLAMA_ARG_LOG_TIMESTAMPS: 1

jobs:
android:
Expand Down Expand Up @@ -73,6 +73,11 @@ jobs:
fetch-depth: 0
lfs: false

- name: Dependencies
run: |
apt-get update
apt-get install -y build-essential

- name: Build
id: ndk_build
run: |
Expand All @@ -86,3 +91,53 @@ jobs:
with:
name: llama-cpp-android-arm64-cpu
path: pkg-adb/llama.cpp

android-arm64:
runs-on: ubuntu-latest

env:
NDK_VERSION: "29.0.14206865"

steps:
- name: Clone
id: checkout
uses: actions/checkout@v6

- name: ccache
uses: ggml-org/ccache-action@v1.2.21
with:
key: android-arm64
evict-old-files: 1d
save: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}

- name: Set up JDK
uses: actions/setup-java@v5
with:
java-version: 17
distribution: temurin

- name: Setup Android SDK
uses: android-actions/setup-android@40fd30fb8d7440372e1316f5d1809ec01dcd3699 # v4.0.1
with:
log-accepted-android-sdk-licenses: false

- name: Install NDK
run: |
sdkmanager "ndk;${{ env.NDK_VERSION }}"
echo "ANDROID_NDK=${ANDROID_SDK_ROOT}/ndk/${{ env.NDK_VERSION }}" >> $GITHUB_ENV

- name: Build
id: cmake_build
run: |
cmake -B build \
-DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK}/build/cmake/android.toolchain.cmake \
-DANDROID_ABI=arm64-v8a \
-DANDROID_PLATFORM=android-28 \
-DLLAMA_FATAL_WARNINGS=ON \
-DGGML_BACKEND_DL=ON \
-DGGML_NATIVE=OFF \
-DGGML_CPU_ALL_VARIANTS=ON \
-DGGML_OPENMP=OFF \
-DLLAMA_BUILD_BORINGSSL=ON \
-DGGML_RPC=ON
time cmake --build build --config Release -j $(nproc)
Loading
Loading