Request: Tensor alignment (256) for llama.cpp quantization

#165

by kusanagi-hf - opened 8 days ago

8 days ago

Hi, thank you for this great model!

Many users run models via llama.cpp, whose K-quants quantization requires the first dimension of certain weight tensors (e.g., attention keys and queries, token embeddings) to be a multiple of 256.
If this requirement is not met, the weights cannot be quantized at the intended bit-width, and a fallback to a wider bit-width is applied, preventing the intended quantization.

For context, I plan to perform quantization at a lower bit-width than MXFP4, so this alignment is particularly important for my use case.

Would it be possible to provide a variant where these tensor dimensions are aligned to 256?

Thank you for your consideration!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment