taronaeo commited on
Commit
449e1a4
·
1 Parent(s): 6e3a7b6

ggml: initial IBM zDNN backend (llama/14975)

Browse files

* ggml-zdnn: inital backend impl

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: temp change z17 to arch15

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: fix build bugs

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: tensor->extra logging check

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: add layout name mapping, ztensor information

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: separate logging into its own line

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: add shape comparison

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: add ggml_tensor shape log

Signed-off-by: Aaron Teo <[email protected]>

ggml-zdnn: fix incorrect shape logging

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add output buffer check

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: run compute and store into tensor->extra

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add set_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add more loggers

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: update set_tensor logging to check only for matmul

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: last working matmul version

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add comments to prevent accidentally deleting lines

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: support op out_prod

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: update op out_prod to use tensor->extra

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: rewrite the backend implementation

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: bugfix new impl

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix compiler warnings and bugfixes

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: test ztensor finding in init_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: implement at least 1 op to test

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: assign tensor->extra to buffer

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add check for view tensors to prevent init_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: rework init_tensor to create new buffers

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: switch to std vector instead of array

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: switch buffers back and set to arbitrary number

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: impl init_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: update supports_op matmul matrix

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix incorrect ztensor shape, reduce memory padding

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: code clean up

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: impl matmul

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix compiler error missing type

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix missing data transform call

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add bias init_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: tighten memory usage, change string allocation

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add bias ztensor and data free

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add bias data transform

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add more debug info for extra buffer transform

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add logger to check if mat mul ops go through set_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: activate bias transform in matmul

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: move weights transform into mulmat

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add more safeguards in matmul

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix sequencing of transforms

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: bugfix transform ztensor vs origtensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: figure out why sigtrap is happening

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix sigsegv

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: move everything back to local declaration

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: move bias data to local also

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: bring back working matmul

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: rewrite into mre

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix missing vector import

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix missing vector import in header

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: attempt to fix sigsegv

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix missing load tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix invalid ztensor buffer release

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add logging to debug free buffer

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: remove free_buffer debug info

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add parmblkformat detections

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add nnpa installed detection

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add zdnn_init call for static libs

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add init_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: attempt at fixing invalid buffer

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: switch to using deque to fix pointer deref problem

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add weights logging to check

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: attempt to use unique ptr

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add tensor to pre_tfm_desc logging

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add inputs logging

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: disable op_none initialisation for testing

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix missing return from init_tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: load ztensors in cgraph exec

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: work on moving output ztensor as well

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: disable logging and breakpoints for full test

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: attempt at manually changing the layout

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: attempt at using default nwhc format instead

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: disable global load ztensor for now

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix errorenous output load tensor

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: add guards to prevent loading ztensor if transformed

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: code cleanup

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: bring load ztensor back to init routine

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: code clean up

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix ztensor deallocation abort

stabilise ggml <-> zdnn api

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: clean up matmul selection

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: clean up project structure

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: update documentation, prepare for upstream

Signed-off-by: Aaron Teo <[email protected]>

* chore: add codeowners

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: disable batched matmul

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: attempt at fixing tensor views during matmul

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: deny all view tensors directly

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix pr comments

Signed-off-by: Aaron Teo <[email protected]>

* docs: update ops docs for zdnn

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: redo test-backend-ops for ops.md

Signed-off-by: Aaron Teo <[email protected]>

* ggml-zdnn: fix typo in build-s390x.md

Signed-off-by: Aaron Teo <[email protected]>

* codeowners: remove taronaeo for now

Signed-off-by: Aaron Teo <[email protected]>

* Revert "codeowners: remove taronaeo for now"

This reverts commit 411ea4ed78d08778967bd0bd33a6538cfcbe082f.

* ggml-zdnn: remove unused ggml_zdnn macro

Signed-off-by: Aaron Teo <[email protected]>

---------

Signed-off-by: Aaron Teo <[email protected]>

ggml/CMakeLists.txt CHANGED
@@ -188,6 +188,7 @@ option(GGML_VULKAN_VALIDATE "ggml: enable Vulkan validation"
188
  option(GGML_VULKAN_RUN_TESTS "ggml: run Vulkan tests" OFF)
189
  option(GGML_WEBGPU "ggml: use WebGPU" OFF)
190
  option(GGML_WEBGPU_DEBUG "ggml: enable WebGPU debug output" OFF)
 
191
  option(GGML_METAL "ggml: use Metal" ${GGML_METAL_DEFAULT})
192
  option(GGML_METAL_USE_BF16 "ggml: use bfloat if available" OFF)
193
  option(GGML_METAL_NDEBUG "ggml: disable Metal debugging" OFF)
 
188
  option(GGML_VULKAN_RUN_TESTS "ggml: run Vulkan tests" OFF)
189
  option(GGML_WEBGPU "ggml: use WebGPU" OFF)
190
  option(GGML_WEBGPU_DEBUG "ggml: enable WebGPU debug output" OFF)
191
+ option(GGML_ZDNN "ggml: use zDNN" OFF)
192
  option(GGML_METAL "ggml: use Metal" ${GGML_METAL_DEFAULT})
193
  option(GGML_METAL_USE_BF16 "ggml: use bfloat if available" OFF)
194
  option(GGML_METAL_NDEBUG "ggml: disable Metal debugging" OFF)
ggml/include/ggml-zdnn.h ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #pragma once
2
+
3
+ #include "ggml.h"
4
+ #include "ggml-backend.h"
5
+
6
+ #ifdef __cplusplus
7
+ extern "C" {
8
+ #endif
9
+
10
+ GGML_BACKEND_API ggml_backend_t ggml_backend_zdnn_init(void);
11
+
12
+ GGML_BACKEND_API ggml_backend_reg_t ggml_backend_zdnn_reg(void);
13
+
14
+ #ifdef __cplusplus
15
+ }
16
+ #endif
ggml/src/CMakeLists.txt CHANGED
@@ -382,6 +382,7 @@ ggml_add_backend(RPC)
382
  ggml_add_backend(SYCL)
383
  ggml_add_backend(Vulkan)
384
  ggml_add_backend(WebGPU)
 
385
  ggml_add_backend(OpenCL)
386
 
387
  foreach (target ggml-base ggml)
 
382
  ggml_add_backend(SYCL)
383
  ggml_add_backend(Vulkan)
384
  ggml_add_backend(WebGPU)
385
+ ggml_add_backend(zDNN)
386
  ggml_add_backend(OpenCL)
387
 
388
  foreach (target ggml-base ggml)
ggml/src/ggml-backend-reg.cpp CHANGED
@@ -49,6 +49,10 @@
49
  #include "ggml-webgpu.h"
50
  #endif
51
 
 
 
 
 
52
  #ifdef GGML_USE_OPENCL
53
  #include "ggml-opencl.h"
54
  #endif
@@ -180,6 +184,9 @@ struct ggml_backend_registry {
180
  #ifdef GGML_USE_WEBGPU
181
  register_backend(ggml_backend_webgpu_reg());
182
  #endif
 
 
 
183
  #ifdef GGML_USE_OPENCL
184
  register_backend(ggml_backend_opencl_reg());
185
  #endif
 
49
  #include "ggml-webgpu.h"
50
  #endif
51
 
52
+ #ifdef GGML_USE_ZDNN
53
+ #include "ggml-zdnn.h"
54
+ #endif
55
+
56
  #ifdef GGML_USE_OPENCL
57
  #include "ggml-opencl.h"
58
  #endif
 
184
  #ifdef GGML_USE_WEBGPU
185
  register_backend(ggml_backend_webgpu_reg());
186
  #endif
187
+ #ifdef GGML_USE_ZDNN
188
+ register_backend(ggml_backend_zdnn_reg());
189
+ #endif
190
  #ifdef GGML_USE_OPENCL
191
  register_backend(ggml_backend_opencl_reg());
192
  #endif
ggml/src/ggml-cpu/CMakeLists.txt CHANGED
@@ -460,7 +460,7 @@ function(ggml_add_cpu_backend_variant_impl tag_name)
460
  # NOTE: Only available from GCC 15.1.0 onwards. Any z17 machine with compile issues must first verify their GCC version.
461
  # binutils must also be updated to the latest for the -march=z17 flag to work. Otherwise, use -march=arch15.
462
  message(STATUS "z17 target")
463
- list(APPEND ARCH_FLAGS -march=z17)
464
  else()
465
  message(STATUS "Unknown target")
466
  message(WARNING "Unknown target. If you are compiling for z14 and earlier, you might have to add -DGGML_VXE=OFF.")
 
460
  # NOTE: Only available from GCC 15.1.0 onwards. Any z17 machine with compile issues must first verify their GCC version.
461
  # binutils must also be updated to the latest for the -march=z17 flag to work. Otherwise, use -march=arch15.
462
  message(STATUS "z17 target")
463
+ list(APPEND ARCH_FLAGS -march=arch15)
464
  else()
465
  message(STATUS "Unknown target")
466
  message(WARNING "Unknown target. If you are compiling for z14 and earlier, you might have to add -DGGML_VXE=OFF.")