Z-Image Turbo Control Unified

This repository hosts the Z-Image Turbo Control Unified model. This is a specialized architecture that unifies the powerful Z-Image Turbo base transformer with ControlNet capabilities into a single, cohesive architecture.

Unlike traditional pipelines where ControlNet is an external add-on, this model integrates control layers directly into the transformer structure. This enables Unified GGUF Quantization, allowing the entire merged architecture (Base + Control) to be quantized (e.g., Q4_K_M) and run on consumer hardware with limited VRAM.

📥 Installation

To set up the environment, simply install the dependencies using the provided requirements file:

python -m venv venv

activate your venv then run:
pip install -r requirements.txt

Note: This repository contains a diffusers_local folder with custom pipelines required to run this specific architecture.

🚀 Usage

We provide two ready-to-use scripts for inference, depending on your hardware capabilities and requirements.

Option 1: Low VRAM (GGUF) - Recommended

Script: infer_gguf.py

Use this version if you have limited VRAM (e.g., 6GB - 8GB) or want to save memory. It loads the model from the quantized GGUF file (z_image_turbo_control_unified_q4_k_m.gguf).

To run:

python infer_gguf.py

Key Features of this mode:

Loads the unified transformer from a single 4-bit quantized file.
Uses GGUFQuantizationConfig for efficient computation.
Enables aggressive group offloading to fit large models in consumer GPUs.

Option 2: High Precision (Diffusers/BF16)

Script: infer_pretrained.py

Use this version if you have ample VRAM (e.g., 24GB+) and want to run the model in standard BFloat16 precision without quantization.

To run:

python infer_pretrained.py

Key Features of this mode:

Loads the model using the standard from_pretrained directory structure.
maintains full floating-point precision.

🏞️ Examples

HED

Example 1 (wo/cfg):

Steps: 9

CFG: 0

Control Scale: 0.7

Prompt: A man holding a bottle

Example 2 (w/cfg):

Steps: 9

CFG: 0

Control Scale: 0.7

Prompt: raw photo, portrait of a handsome Asian man sitting at a wooden table, holding a green glass bottle, wearing a black sweater, wristwatch, highly detailed skin texture, realistic pores, serious gaze, soft cinematic lighting, rim lighting, balanced exposure, 8k uhd, dslr, sharp focus, wood grain texture.

Negative prompt: underexposed, crushed blacks, too dark, heavy shadows, makeup, smooth skin, plastic, wax, cartoon, illustration, distorted hands, bad anatomy, blur, haze, flat lighting.

DEPTH

Example 3 (wo/cfg):

Steps: 9

CFG: 0

Control Scale: 0.7

Prompt: A cat

🛠️ Model Configuration

The inference scripts are pre-configured with parameters optimized for the Turbo nature of this model:

Inference Steps: 9 steps (Fast generation).
Guidance Scale: 0.0 (Turbo models do not use CFG).
Conditioning Scale: 0.7 (Recommended strength for ControlNet).
Shift: 3.0 (Scheduler shift parameter).

📂 Repository Structure

z_image_turbo_control_unified_q4_k_m.gguf: The unified, quantized model weights.
infer_gguf.py: Script for running GGUF inference.
infer_pretrained.py: Script for running standard Diffusers inference.
diffusers_local/: Custom pipeline code (ZImageControlUnifiedPipeline) and transformer logic.
requirements.txt: Python dependencies.

Downloads last month: 117

GGUF

Model size

8B params

Architecture

lumina2

Hardware compatibility

4-bit