Z-Image Turbo Control Unified
This repository hosts the Z-Image Turbo Control Unified model. This is a specialized architecture that unifies the powerful Z-Image Turbo base transformer with ControlNet capabilities into a single, cohesive architecture.
Unlike traditional pipelines where ControlNet is an external add-on, this model integrates control layers directly into the transformer structure. This enables Unified GGUF Quantization, allowing the entire merged architecture (Base + Control) to be quantized (e.g., Q4_K_M) and run on consumer hardware with limited VRAM.
π₯ Installation
To set up the environment, simply install the dependencies using the provided requirements file:
python -m venv venv
activate your venv then run:
pip install -r requirements.txt
Note: This repository contains a diffusers_local folder with custom pipelines required to run this specific architecture.
π Usage
We provide two ready-to-use scripts for inference, depending on your hardware capabilities and requirements.
Option 1: Low VRAM (GGUF) - Recommended
Script: infer_gguf.py
Use this version if you have limited VRAM (e.g., 6GB - 8GB) or want to save memory. It loads the model from the quantized GGUF file (z_image_turbo_control_unified_q4_k_m.gguf).
To run:
python infer_gguf.py
Key Features of this mode:
- Loads the unified transformer from a single 4-bit quantized file.
- Uses
GGUFQuantizationConfigfor efficient computation. - Enables aggressive group offloading to fit large models in consumer GPUs.
Option 2: High Precision (Diffusers/BF16)
Script: infer_pretrained.py
Use this version if you have ample VRAM (e.g., 24GB+) and want to run the model in standard BFloat16 precision without quantization.
To run:
python infer_pretrained.py
Key Features of this mode:
- Loads the model using the standard
from_pretraineddirectory structure. - maintains full floating-point precision.
ποΈ Examples
HED
Example 1 (wo/cfg):
Steps: 9
CFG: 0
Control Scale: 0.7
Prompt: A man holding a bottle
Example 2 (w/cfg):
Steps: 9
CFG: 0
Control Scale: 0.7
Prompt: raw photo, portrait of a handsome Asian man sitting at a wooden table, holding a green glass bottle, wearing a black sweater, wristwatch, highly detailed skin texture, realistic pores, serious gaze, soft cinematic lighting, rim lighting, balanced exposure, 8k uhd, dslr, sharp focus, wood grain texture.
Negative prompt: underexposed, crushed blacks, too dark, heavy shadows, makeup, smooth skin, plastic, wax, cartoon, illustration, distorted hands, bad anatomy, blur, haze, flat lighting.
DEPTH
Example 3 (wo/cfg):
Steps: 9
CFG: 0
Control Scale: 0.7
Prompt: A cat
π οΈ Model Configuration
The inference scripts are pre-configured with parameters optimized for the Turbo nature of this model:
- Inference Steps: 9 steps (Fast generation).
- Guidance Scale: 0.0 (Turbo models do not use CFG).
- Conditioning Scale: 0.7 (Recommended strength for ControlNet).
- Shift: 3.0 (Scheduler shift parameter).
π Repository Structure
z_image_turbo_control_unified_q4_k_m.gguf: The unified, quantized model weights.infer_gguf.py: Script for running GGUF inference.infer_pretrained.py: Script for running standard Diffusers inference.diffusers_local/: Custom pipeline code (ZImageControlUnifiedPipeline) and transformer logic.requirements.txt: Python dependencies.
- Downloads last month
- 117
4-bit