Spaces:
Running
ggml: initial IBM zDNN backend (llama/14975)
Browse files* ggml-zdnn: inital backend impl
Signed-off-by: Aaron Teo <[email protected]>
ggml-zdnn: temp change z17 to arch15
Signed-off-by: Aaron Teo <[email protected]>
ggml-zdnn: fix build bugs
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: tensor->extra logging check
Signed-off-by: Aaron Teo <[email protected]>
ggml-zdnn: add layout name mapping, ztensor information
Signed-off-by: Aaron Teo <[email protected]>
ggml-zdnn: separate logging into its own line
Signed-off-by: Aaron Teo <[email protected]>
ggml-zdnn: add shape comparison
Signed-off-by: Aaron Teo <[email protected]>
ggml-zdnn: add ggml_tensor shape log
Signed-off-by: Aaron Teo <[email protected]>
ggml-zdnn: fix incorrect shape logging
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: add output buffer check
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: run compute and store into tensor->extra
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: add set_tensor
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: add more loggers
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: update set_tensor logging to check only for matmul
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: last working matmul version
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: add comments to prevent accidentally deleting lines
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: support op out_prod
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: update op out_prod to use tensor->extra
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: rewrite the backend implementation
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: bugfix new impl
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: fix compiler warnings and bugfixes
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: test ztensor finding in init_tensor
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: implement at least 1 op to test
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: assign tensor->extra to buffer
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: add check for view tensors to prevent init_tensor
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: rework init_tensor to create new buffers
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: switch to std vector instead of array
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: switch buffers back and set to arbitrary number
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: impl init_tensor
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: update supports_op matmul matrix
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: fix incorrect ztensor shape, reduce memory padding
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: code clean up
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: impl matmul
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: fix compiler error missing type
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: fix missing data transform call
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: add bias init_tensor
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: tighten memory usage, change string allocation
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: add bias ztensor and data free
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: add bias data transform
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: add more debug info for extra buffer transform
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: add logger to check if mat mul ops go through set_tensor
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: activate bias transform in matmul
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: move weights transform into mulmat
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: add more safeguards in matmul
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: fix sequencing of transforms
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: bugfix transform ztensor vs origtensor
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: figure out why sigtrap is happening
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: fix sigsegv
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: move everything back to local declaration
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: move bias data to local also
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: bring back working matmul
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: rewrite into mre
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: fix missing vector import
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: fix missing vector import in header
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: attempt to fix sigsegv
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: fix missing load tensor
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: fix invalid ztensor buffer release
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: add logging to debug free buffer
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: remove free_buffer debug info
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: add parmblkformat detections
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: add nnpa installed detection
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: add zdnn_init call for static libs
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: add init_tensor
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: attempt at fixing invalid buffer
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: switch to using deque to fix pointer deref problem
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: add weights logging to check
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: attempt to use unique ptr
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: add tensor to pre_tfm_desc logging
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: add inputs logging
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: disable op_none initialisation for testing
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: fix missing return from init_tensor
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: load ztensors in cgraph exec
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: work on moving output ztensor as well
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: disable logging and breakpoints for full test
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: attempt at manually changing the layout
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: attempt at using default nwhc format instead
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: disable global load ztensor for now
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: fix errorenous output load tensor
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: add guards to prevent loading ztensor if transformed
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: code cleanup
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: bring load ztensor back to init routine
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: code clean up
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: fix ztensor deallocation abort
stabilise ggml <-> zdnn api
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: clean up matmul selection
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: clean up project structure
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: update documentation, prepare for upstream
Signed-off-by: Aaron Teo <[email protected]>
* chore: add codeowners
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: disable batched matmul
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: attempt at fixing tensor views during matmul
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: deny all view tensors directly
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: fix pr comments
Signed-off-by: Aaron Teo <[email protected]>
* docs: update ops docs for zdnn
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: redo test-backend-ops for ops.md
Signed-off-by: Aaron Teo <[email protected]>
* ggml-zdnn: fix typo in build-s390x.md
Signed-off-by: Aaron Teo <[email protected]>
* codeowners: remove taronaeo for now
Signed-off-by: Aaron Teo <[email protected]>
* Revert "codeowners: remove taronaeo for now"
This reverts commit 411ea4ed78d08778967bd0bd33a6538cfcbe082f.
* ggml-zdnn: remove unused ggml_zdnn macro
Signed-off-by: Aaron Teo <[email protected]>
---------
Signed-off-by: Aaron Teo <[email protected]>
- ggml/CMakeLists.txt +1 -0
- ggml/include/ggml-zdnn.h +16 -0
- ggml/src/CMakeLists.txt +1 -0
- ggml/src/ggml-backend-reg.cpp +7 -0
- ggml/src/ggml-cpu/CMakeLists.txt +1 -1
|
@@ -188,6 +188,7 @@ option(GGML_VULKAN_VALIDATE "ggml: enable Vulkan validation"
|
|
| 188 |
option(GGML_VULKAN_RUN_TESTS "ggml: run Vulkan tests" OFF)
|
| 189 |
option(GGML_WEBGPU "ggml: use WebGPU" OFF)
|
| 190 |
option(GGML_WEBGPU_DEBUG "ggml: enable WebGPU debug output" OFF)
|
|
|
|
| 191 |
option(GGML_METAL "ggml: use Metal" ${GGML_METAL_DEFAULT})
|
| 192 |
option(GGML_METAL_USE_BF16 "ggml: use bfloat if available" OFF)
|
| 193 |
option(GGML_METAL_NDEBUG "ggml: disable Metal debugging" OFF)
|
|
|
|
| 188 |
option(GGML_VULKAN_RUN_TESTS "ggml: run Vulkan tests" OFF)
|
| 189 |
option(GGML_WEBGPU "ggml: use WebGPU" OFF)
|
| 190 |
option(GGML_WEBGPU_DEBUG "ggml: enable WebGPU debug output" OFF)
|
| 191 |
+
option(GGML_ZDNN "ggml: use zDNN" OFF)
|
| 192 |
option(GGML_METAL "ggml: use Metal" ${GGML_METAL_DEFAULT})
|
| 193 |
option(GGML_METAL_USE_BF16 "ggml: use bfloat if available" OFF)
|
| 194 |
option(GGML_METAL_NDEBUG "ggml: disable Metal debugging" OFF)
|
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#pragma once
|
| 2 |
+
|
| 3 |
+
#include "ggml.h"
|
| 4 |
+
#include "ggml-backend.h"
|
| 5 |
+
|
| 6 |
+
#ifdef __cplusplus
|
| 7 |
+
extern "C" {
|
| 8 |
+
#endif
|
| 9 |
+
|
| 10 |
+
GGML_BACKEND_API ggml_backend_t ggml_backend_zdnn_init(void);
|
| 11 |
+
|
| 12 |
+
GGML_BACKEND_API ggml_backend_reg_t ggml_backend_zdnn_reg(void);
|
| 13 |
+
|
| 14 |
+
#ifdef __cplusplus
|
| 15 |
+
}
|
| 16 |
+
#endif
|
|
@@ -382,6 +382,7 @@ ggml_add_backend(RPC)
|
|
| 382 |
ggml_add_backend(SYCL)
|
| 383 |
ggml_add_backend(Vulkan)
|
| 384 |
ggml_add_backend(WebGPU)
|
|
|
|
| 385 |
ggml_add_backend(OpenCL)
|
| 386 |
|
| 387 |
foreach (target ggml-base ggml)
|
|
|
|
| 382 |
ggml_add_backend(SYCL)
|
| 383 |
ggml_add_backend(Vulkan)
|
| 384 |
ggml_add_backend(WebGPU)
|
| 385 |
+
ggml_add_backend(zDNN)
|
| 386 |
ggml_add_backend(OpenCL)
|
| 387 |
|
| 388 |
foreach (target ggml-base ggml)
|
|
@@ -49,6 +49,10 @@
|
|
| 49 |
#include "ggml-webgpu.h"
|
| 50 |
#endif
|
| 51 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
#ifdef GGML_USE_OPENCL
|
| 53 |
#include "ggml-opencl.h"
|
| 54 |
#endif
|
|
@@ -180,6 +184,9 @@ struct ggml_backend_registry {
|
|
| 180 |
#ifdef GGML_USE_WEBGPU
|
| 181 |
register_backend(ggml_backend_webgpu_reg());
|
| 182 |
#endif
|
|
|
|
|
|
|
|
|
|
| 183 |
#ifdef GGML_USE_OPENCL
|
| 184 |
register_backend(ggml_backend_opencl_reg());
|
| 185 |
#endif
|
|
|
|
| 49 |
#include "ggml-webgpu.h"
|
| 50 |
#endif
|
| 51 |
|
| 52 |
+
#ifdef GGML_USE_ZDNN
|
| 53 |
+
#include "ggml-zdnn.h"
|
| 54 |
+
#endif
|
| 55 |
+
|
| 56 |
#ifdef GGML_USE_OPENCL
|
| 57 |
#include "ggml-opencl.h"
|
| 58 |
#endif
|
|
|
|
| 184 |
#ifdef GGML_USE_WEBGPU
|
| 185 |
register_backend(ggml_backend_webgpu_reg());
|
| 186 |
#endif
|
| 187 |
+
#ifdef GGML_USE_ZDNN
|
| 188 |
+
register_backend(ggml_backend_zdnn_reg());
|
| 189 |
+
#endif
|
| 190 |
#ifdef GGML_USE_OPENCL
|
| 191 |
register_backend(ggml_backend_opencl_reg());
|
| 192 |
#endif
|
|
@@ -460,7 +460,7 @@ function(ggml_add_cpu_backend_variant_impl tag_name)
|
|
| 460 |
# NOTE: Only available from GCC 15.1.0 onwards. Any z17 machine with compile issues must first verify their GCC version.
|
| 461 |
# binutils must also be updated to the latest for the -march=z17 flag to work. Otherwise, use -march=arch15.
|
| 462 |
message(STATUS "z17 target")
|
| 463 |
-
list(APPEND ARCH_FLAGS -march=
|
| 464 |
else()
|
| 465 |
message(STATUS "Unknown target")
|
| 466 |
message(WARNING "Unknown target. If you are compiling for z14 and earlier, you might have to add -DGGML_VXE=OFF.")
|
|
|
|
| 460 |
# NOTE: Only available from GCC 15.1.0 onwards. Any z17 machine with compile issues must first verify their GCC version.
|
| 461 |
# binutils must also be updated to the latest for the -march=z17 flag to work. Otherwise, use -march=arch15.
|
| 462 |
message(STATUS "z17 target")
|
| 463 |
+
list(APPEND ARCH_FLAGS -march=arch15)
|
| 464 |
else()
|
| 465 |
message(STATUS "Unknown target")
|
| 466 |
message(WARNING "Unknown target. If you are compiling for z14 and earlier, you might have to add -DGGML_VXE=OFF.")
|