Request: Tensor alignment (256) for llama.cpp quantization
#165
by
kusanagi-hf
- opened
Hi, thank you for this great model!
Many users run models via llama.cpp, whose K-quants quantization requires the first dimension of certain weight tensors (e.g., attention keys and queries, token embeddings) to be a multiple of 256.
If this requirement is not met, the weights cannot be quantized at the intended bit-width, and a fallback to a wider bit-width is applied, preventing the intended quantization.
For context, I plan to perform quantization at a lower bit-width than MXFP4, so this alignment is particularly important for my use case.
Would it be possible to provide a variant where these tensor dimensions are aligned to 256?
Thank you for your consideration!