deepseek-7b-math-code-lagrange-optimal

Hermite 補間で最適化された λ によるモデルマージ。

Merge Configuration

Parameter Value
Method Hermite interpolation (Phase 2 optimized)
λ [0.499256, 0.500744]
dtype torch.float16
  • Model 0 (deepseek-ai/deepseek-math-7b-instruct): λ=0.499256
  • Model 1 (deepseek-ai/deepseek-coder-7b-instruct-v1.5): λ=0.500744

Tokenizer

Union tokenizer (mergekit-style): vocab size = 100016

Formula

θ* = Σ_k λ_k θ_k

The mixing weights λ were optimized by minimizing the Hermite polynomial approximation of the loss function (see Phase 2).

Downloads last month
29
Safetensors
Model size
7B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lejelly/deepseek-7b-math-code-lagrange-optimal

Finetuned
(47)
this model