Multi-Block Diffusion Language Models (MBD-LMs)

This repository contains the model weights for Multi-Block Diffusion Language Models (MBD-LMs), presented in the paper Multi-Block Diffusion Language Models.

Introduction

Block Diffusion Language Models (BD-LMs) improve diffusion-based text generation with KV caching and flexible-length generation. MBD-LMs extend them from Single-Block Diffusion (SingleBD) to Multi-Block Diffusion (MultiBD), where a running-set of consecutive blocks is decoded concurrently for inter-block parallelism.

This model is obtained by post-training BD-LMs with Multi-block Teacher Forcing (MultiTF), which integrates teacher forcing and diffusion forcing by training on bounded noise-groups conditioned on clean prefixes.

For setup guidelines, training configurations, and optimized inference engine setups, please refer to the official repository and the Diffulex engine.

Citation

@article{jin2026multiblock,
  title={Multi-Block Diffusion Language Models},
  author={Yijie Jin and Jiajun Xu and Yuxuan Liu and Chenkai Xu and Yi Tu and Jiajun Li and Dandan Tu and Xiaohui Yan and Kai Yu and Pengfei Liu and Zhijie Deng},
  journal={arXiv preprint arXiv:2606.29215},
  year={2026}
}
Downloads last month
24
Safetensors
Model size
16B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including SJTU-DENG-Lab/MBD-Code-LLaDA2-mini-DMax-16B

Paper for SJTU-DENG-Lab/MBD-Code-LLaDA2-mini-DMax-16B