Qwen3 8B - Heretic (Abliterated)
An abliterated version of Qwen's Qwen3-8B created using Heretic v1.2.0 (git master). This model has reduced refusals while maintaining model quality, making it suitable as an uncensored text encoder for image generation models like Klein 9B.
You can see the docker, scripts and configurations used to make these files on Heretic Docker Github.
Model Details
- Base Model: Qwen/Qwen3-8B
- Abliteration Method: Heretic v1.2.0 (git master, commit
19cdf7e) - Trials: 3000
- Trial Selected: Trial 2681
- Refusals: 13/100 (vs 100/100 original)
- KL Divergence: 0.0838 (minimal model damage)
Files
HuggingFace Format (for transformers, llama.cpp conversion)
model.safetensors (~16 GB)
config.json
tokenizer.json
tokenizer_config.json
generation_config.json
chat_template.jinja
ComfyUI Format (for Klein 9B text encoder)
comfyui/qwen3-8b-heretic.safetensors # bf16, 16GB
comfyui/qwen3-8b-heretic_fp8_e4m3fn.safetensors # fp8, 8.8GB
comfyui/qwen3-8b-heretic_nvfp4.safetensors # nvfp4, 6.0GB
GGUF Format (for llama.cpp and ComfyUI-GGUF)
| Quant | Size | Notes |
|---|---|---|
| F16 | 16GB | Lossless reference |
| Q8_0 | 8.2GB | Excellent quality |
| Q6_K | 6.3GB | Very good quality |
| Q5_K_M | 5.5GB | Good quality |
| Q5_K_S | 5.4GB | Slightly smaller Q5 |
| Q4_K_M | 5.0GB | Recommended balance |
| Q4_K_S | 4.8GB | Smaller Q4 variant |
| Q3_K_M | 3.9GB | For low VRAM only |
NVFP4 Notes
The NVFP4 (4-bit floating point, E2M1) variants use ComfyUI's native quantization format. They are ~3x smaller than bf16 and load natively in ComfyUI without any plugins. Blackwell GPUs (RTX 5090/5080, SM100+) can use native FP4 tensor cores for best performance, but ComfyUI also supports software dequantization on older GPUs (tested working on RTX 4090).
Usage
With ComfyUI (Klein 9B)
Download a ComfyUI format file:
- FP8 (recommended):
comfyui/qwen3-8b-heretic_fp8_e4m3fn.safetensors(8.8GB) - NVFP4 (smallest):
comfyui/qwen3-8b-heretic_nvfp4.safetensors(6.0GB) - bf16 (full precision):
comfyui/qwen3-8b-heretic.safetensors(16GB)
- FP8 (recommended):
Place in
ComfyUI/models/text_encoders/In your Klein 9B workflow, use the
ClipLoadernode and select the heretic file
With Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"DreamFast/qwen3-8b-heretic",
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("DreamFast/qwen3-8b-heretic")
prompt = "Describe a dramatic sunset over a cyberpunk city"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
With llama.cpp
llama-server -m qwen3-8b-heretic-Q4_K_M.gguf
Abliteration Process
Created using Heretic v1.2.0 (git master) with 3000 optimization trials:
? Which trial do you want to use?
[Trial 2732] Refusals: 10/100, KL divergence: 0.1001
> [Trial 2681] Refusals: 13/100, KL divergence: 0.0838 <-- selected
[Trial 2337] Refusals: 18/100, KL divergence: 0.0643
[Trial 2419] Refusals: 19/100, KL divergence: 0.0600
[Trial 2195] Refusals: 21/100, KL divergence: 0.0534
...
Trial 2681 was selected for its balance of low refusals (13/100) and reasonable KL divergence (0.0838), indicating minimal model damage while achieving 87% of previously-refused prompts now working.
Limitations
- This model inherits all limitations of the base Qwen3-8B model
- Abliteration reduces but does not completely eliminate refusals (13/100 remain)
- NVFP4 quantization works best on Blackwell GPUs (RTX 5090/5080) with native FP4 tensor cores, but also works on older GPUs via software dequantization
License
This model is released under the Apache 2.0 License, following the base Qwen3-8B model license.
Acknowledgments
- Qwen for the Qwen3-8B model
- Heretic by p-e-w for the abliteration tool
- Black Forest Labs for Klein 9B
- llama.cpp for GGUF conversion
- Downloads last month
- 250