Qwen3 8B - Heretic (Abliterated)

An abliterated version of Qwen's Qwen3-8B created using Heretic v1.2.0 (git master). This model has reduced refusals while maintaining model quality, making it suitable as an uncensored text encoder for image generation models like Klein 9B.

You can see the docker, scripts and configurations used to make these files on Heretic Docker Github.

Model Details

Base Model: Qwen/Qwen3-8B
Abliteration Method: Heretic v1.2.0 (git master, commit 19cdf7e)
Trials: 3000
Trial Selected: Trial 2681
Refusals: 13/100 (vs 100/100 original)
KL Divergence: 0.0838 (minimal model damage)

Files

HuggingFace Format (for transformers, llama.cpp conversion)

model.safetensors (~16 GB)
config.json
tokenizer.json
tokenizer_config.json
generation_config.json
chat_template.jinja

ComfyUI Format (for Klein 9B text encoder)

comfyui/qwen3-8b-heretic.safetensors              # bf16, 16GB
comfyui/qwen3-8b-heretic_fp8_e4m3fn.safetensors   # fp8, 8.8GB
comfyui/qwen3-8b-heretic_nvfp4.safetensors        # nvfp4, 6.0GB

GGUF Format (for llama.cpp and ComfyUI-GGUF)

Quant	Size	Notes
F16	16GB	Lossless reference
Q8_0	8.2GB	Excellent quality
Q6_K	6.3GB	Very good quality
Q5_K_M	5.5GB	Good quality
Q5_K_S	5.4GB	Slightly smaller Q5
Q4_K_M	5.0GB	Recommended balance
Q4_K_S	4.8GB	Smaller Q4 variant
Q3_K_M	3.9GB	For low VRAM only

NVFP4 Notes

The NVFP4 (4-bit floating point, E2M1) variants use ComfyUI's native quantization format. They are ~3x smaller than bf16 and load natively in ComfyUI without any plugins. Blackwell GPUs (RTX 5090/5080, SM100+) can use native FP4 tensor cores for best performance, but ComfyUI also supports software dequantization on older GPUs (tested working on RTX 4090).

Usage

With ComfyUI (Klein 9B)

Download a ComfyUI format file:
- FP8 (recommended): comfyui/qwen3-8b-heretic_fp8_e4m3fn.safetensors (8.8GB)
- NVFP4 (smallest): comfyui/qwen3-8b-heretic_nvfp4.safetensors (6.0GB)
- bf16 (full precision): comfyui/qwen3-8b-heretic.safetensors (16GB)
Place in ComfyUI/models/text_encoders/
In your Klein 9B workflow, use the ClipLoader node and select the heretic file

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "DreamFast/qwen3-8b-heretic",
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("DreamFast/qwen3-8b-heretic")

prompt = "Describe a dramatic sunset over a cyberpunk city"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With llama.cpp

llama-server -m qwen3-8b-heretic-Q4_K_M.gguf

Abliteration Process

Created using Heretic v1.2.0 (git master) with 3000 optimization trials:

? Which trial do you want to use?
  [Trial 2732] Refusals: 10/100, KL divergence: 0.1001
> [Trial 2681] Refusals: 13/100, KL divergence: 0.0838  <-- selected
  [Trial 2337] Refusals: 18/100, KL divergence: 0.0643
  [Trial 2419] Refusals: 19/100, KL divergence: 0.0600
  [Trial 2195] Refusals: 21/100, KL divergence: 0.0534
  ...

Trial 2681 was selected for its balance of low refusals (13/100) and reasonable KL divergence (0.0838), indicating minimal model damage while achieving 87% of previously-refused prompts now working.

Limitations

This model inherits all limitations of the base Qwen3-8B model
Abliteration reduces but does not completely eliminate refusals (13/100 remain)
NVFP4 quantization works best on Blackwell GPUs (RTX 5090/5080) with native FP4 tensor cores, but also works on older GPUs via software dequantization

License

This model is released under the Apache 2.0 License, following the base Qwen3-8B model license.

Acknowledgments

Qwen for the Qwen3-8B model
Heretic by p-e-w for the abliteration tool
Black Forest Labs for Klein 9B
llama.cpp for GGUF conversion

Downloads last month: 250

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for DreamFast/qwen3-8b-heretic

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

(1144)

this model