Qwen3 8B - Heretic (Abliterated)

An abliterated version of Qwen's Qwen3-8B created using Heretic v1.2.0 (git master). This model has reduced refusals while maintaining model quality, making it suitable as an uncensored text encoder for image generation models like Klein 9B.

You can see the docker, scripts and configurations used to make these files on Heretic Docker Github.

Model Details

  • Base Model: Qwen/Qwen3-8B
  • Abliteration Method: Heretic v1.2.0 (git master, commit 19cdf7e)
  • Trials: 3000
  • Trial Selected: Trial 2681
  • Refusals: 13/100 (vs 100/100 original)
  • KL Divergence: 0.0838 (minimal model damage)

Files

HuggingFace Format (for transformers, llama.cpp conversion)

model.safetensors (~16 GB)
config.json
tokenizer.json
tokenizer_config.json
generation_config.json
chat_template.jinja

ComfyUI Format (for Klein 9B text encoder)

comfyui/qwen3-8b-heretic.safetensors              # bf16, 16GB
comfyui/qwen3-8b-heretic_fp8_e4m3fn.safetensors   # fp8, 8.8GB
comfyui/qwen3-8b-heretic_nvfp4.safetensors        # nvfp4, 6.0GB

GGUF Format (for llama.cpp and ComfyUI-GGUF)

Quant Size Notes
F16 16GB Lossless reference
Q8_0 8.2GB Excellent quality
Q6_K 6.3GB Very good quality
Q5_K_M 5.5GB Good quality
Q5_K_S 5.4GB Slightly smaller Q5
Q4_K_M 5.0GB Recommended balance
Q4_K_S 4.8GB Smaller Q4 variant
Q3_K_M 3.9GB For low VRAM only

NVFP4 Notes

The NVFP4 (4-bit floating point, E2M1) variants use ComfyUI's native quantization format. They are ~3x smaller than bf16 and load natively in ComfyUI without any plugins. Blackwell GPUs (RTX 5090/5080, SM100+) can use native FP4 tensor cores for best performance, but ComfyUI also supports software dequantization on older GPUs (tested working on RTX 4090).

Usage

With ComfyUI (Klein 9B)

  1. Download a ComfyUI format file:

    • FP8 (recommended): comfyui/qwen3-8b-heretic_fp8_e4m3fn.safetensors (8.8GB)
    • NVFP4 (smallest): comfyui/qwen3-8b-heretic_nvfp4.safetensors (6.0GB)
    • bf16 (full precision): comfyui/qwen3-8b-heretic.safetensors (16GB)
  2. Place in ComfyUI/models/text_encoders/

  3. In your Klein 9B workflow, use the ClipLoader node and select the heretic file

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "DreamFast/qwen3-8b-heretic",
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("DreamFast/qwen3-8b-heretic")

prompt = "Describe a dramatic sunset over a cyberpunk city"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With llama.cpp

llama-server -m qwen3-8b-heretic-Q4_K_M.gguf

Abliteration Process

Created using Heretic v1.2.0 (git master) with 3000 optimization trials:

? Which trial do you want to use?
  [Trial 2732] Refusals: 10/100, KL divergence: 0.1001
> [Trial 2681] Refusals: 13/100, KL divergence: 0.0838  <-- selected
  [Trial 2337] Refusals: 18/100, KL divergence: 0.0643
  [Trial 2419] Refusals: 19/100, KL divergence: 0.0600
  [Trial 2195] Refusals: 21/100, KL divergence: 0.0534
  ...

Trial 2681 was selected for its balance of low refusals (13/100) and reasonable KL divergence (0.0838), indicating minimal model damage while achieving 87% of previously-refused prompts now working.

Limitations

  • This model inherits all limitations of the base Qwen3-8B model
  • Abliteration reduces but does not completely eliminate refusals (13/100 remain)
  • NVFP4 quantization works best on Blackwell GPUs (RTX 5090/5080) with native FP4 tensor cores, but also works on older GPUs via software dequantization

License

This model is released under the Apache 2.0 License, following the base Qwen3-8B model license.

Acknowledgments

Downloads last month
250
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DreamFast/qwen3-8b-heretic

Finetuned
Qwen/Qwen3-8B
Finetuned
(1144)
this model