${titlePost}

08 Jun 19:31

247e5c6

b5606 Latest

Latest

cuda : fix buffer type check with integrated GPUs (#14069)

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6
373 MB 2025-06-08T19:31:08Z
llama-b5606-bin-macos-arm64.zip

sha256:d4a6cf2a5090445bcaf9f974a3ff03074c93cfb0d0b19adb7dc4dc31407d5860
10.4 MB 2025-06-08T19:31:18Z
llama-b5606-bin-macos-x64.zip

sha256:e652cede41139674efcbb9ec93ffe69d2aa8804f33c965e24c7920102b4e6cad
25.3 MB 2025-06-08T19:31:19Z
llama-b5606-bin-ubuntu-vulkan-x64.zip

sha256:eb35004a71f0b3d2a17f4c9f1997546ef88d13a4b1ff84a429f5364be8a8aee3
19.7 MB 2025-06-08T19:31:20Z
llama-b5606-bin-ubuntu-x64.zip

sha256:cd595fc60c2d6015b45f2febf43612a5b5ce3bc12c10bf6992f4b3b112322c7c
12 MB 2025-06-08T19:31:21Z
llama-b5606-bin-win-cpu-arm64.zip

sha256:09d26768091e123a31629bb564d8cad994e4d292f197fbe2f2030a1c458477e5
10.7 MB 2025-06-08T19:31:22Z
llama-b5606-bin-win-cpu-x64.zip

sha256:cd2e3fe9425abd63e79b2660fc37ffc17dd5dee13db5b745d45b196244323ffd
13.3 MB 2025-06-08T19:31:22Z
llama-b5606-bin-win-cuda-12.4-x64.zip

sha256:8efa54270e87bfb56444f884d128677a50aad1790fd31ed03bc24273516f0852
126 MB 2025-06-08T19:31:23Z
llama-b5606-bin-win-hip-radeon-x64.zip

sha256:09991739ed883a6f7d78e888a8ab3dc0740286e4df441ab210c92323dc290d1d
297 MB 2025-06-08T19:31:27Z
llama-b5606-bin-win-opencl-adreno-arm64.zip

sha256:9ab6c56716ec54742cef27d3a68f5337995f240aa04e5902452660d22260838a
11 MB 2025-06-08T19:31:34Z
Source code (zip)

2025-06-08T18:39:56Z
Source code (tar.gz)

2025-06-08T18:39:56Z

07 Jun 13:43

github-actions

b5604

228f34c

b5604

SYCL: Implement few same quantized type copy kernels (#13739)

* SYCL: Implement few same quantized type copy kernels

* Use memcpy for copying contiguous tensors

ggml-ci

* feat(sycl): add contiguous tensor copy support and device checks

Adds a memcpy path for contiguous tensors of the same type to optimize data transfer. Updates device support checks to recognize contiguous tensor operations, improving compatibility and performance.

* refactor: replace specific block copy functions with template

The changes replace multiple redundant block copy functions (e.g., cpy_block_q8_0_q8_0, cpy_block_q5_0_q5_0) with a single templated function cpy_blck_q_q. This reduces code duplication by using a generic template that works for any block type, improving maintainability while preserving the same functionality. The template is instantiated with specific block types (e.g., block_q8_0) where needed.

* Exclude BF16 support for COPY tensors for now
ggml-ci

* perf: adjust SYCL copy kernel block sizes for efficiency

Use ceil_div to ensure full element coverage and update nd_range parameters to better align with SYCL block sizes, improving parallelism and device utilization in copy operations.

Assets 15

07 Jun 13:13

github-actions

b5603

0974ad7

b5603

llama : fix llama_model_chat_template with template name (LLM_KV with…

Assets 15

06 Jun 13:05

github-actions

b5602

745aa53

b5602

llama : deprecate llama_kv_self_ API (#14030)

* llama : deprecate llama_kv_self_ API

ggml-ci

* llama : allow llama_memory_(nullptr)

ggml-ci

* memory : add flag for optional data clear in llama_memory_clear

ggml-ci

Assets 15

06 Jun 12:13

github-actions

b5601

487a5e0

b5601

context : fix SWA-related warning for multiple sequences (#14045)

Assets 15

06 Jun 08:02

github-actions

b5600

d17a809

b5600

llama : support multiple classifier outputs and labels (#13940)

Assets 15

05 Jun 14:39

github-actions

b5598

669c13e

b5598

vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs…

Assets 15

05 Jun 13:13

github-actions

b5596

7f37b6c

b5596

memory : migrate from llama_kv_cache to more generic llama_memory (#1…

Assets 15

05 Jun 10:16

github-actions

b5595

3a07714

b5595

llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WI…

Assets 15

05 Jun 07:48

github-actions

b5593

9f47fa5

b5593

vocab : warn about missing mask token (#14022)

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b5606

Uh oh!

b5604

Uh oh!

b5603

Uh oh!

b5602

Uh oh!

b5601

Uh oh!

b5600

Uh oh!

b5598

Uh oh!

b5596

Uh oh!

b5595

Uh oh!

b5593

Uh oh!