OmniGene-4-CPT-v2-4bit

BF16 model with automatic 4-bit quantization for RTX 5090 (32GB)

This model automatically quantizes to 4-bit when loaded, requiring only ~13GB GPU memory.

Model Description

OmniGene-4-CPT-v2-4bit is a biological foundation model with:

Base: Gemma-4-26B-A4B-Instruct (MoE, 128 experts, top-8 routing)
Vocabulary: 290,048 tokens (262,020 original + 28,028 bio tokens)
CPT data: 32.5 GB mixed corpus (DNA, Protein, OpenWebText, Structure)
Training: 0.6 epoch, 2,806 steps, 8×H20 GPUs
Storage: BF16 (~49 GB, 32 shards of ~1.5GB each)
Runtime: Automatic 4-bit quantization (~13GB GPU memory)

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model (automatically quantizes to 4-bit)
model = AutoModelForCausalLM.from_pretrained(
    "dnagpt/OmniGene-4-CPT-v2-4bit",
    device_map="auto",  # Automatically applies quantization_config.json
)
tokenizer = AutoTokenizer.from_pretrained("dnagpt/OmniGene-4-CPT-v2-4bit")

# Generate
prompt = "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVVHSLAKWKRQTLGQHDFSAGEGLYTHMKALRPDEDRLSPLHSVYVDQWDWERVMGDGERQFSTLKSTVEAIWAGIKATEAAVSEEFGLAPFLPDQIHFVHSQELLSRYPDLDAKGRERAIAKDLGAVFLVGIGGKLSDGHRHDVRAPDYDDWSTPSELGHAGLNGDILVWNPVLEDAFELSSMGIRVDADTLKHQLALTGDEDRLELEWHQALLRGEMPQTIGGGIGQSRLTMLLLQLPHIGQVQAGVWPAAVRESVPSLL"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Hardware Requirements

GPU Memory: ~13-15GB (after automatic 4-bit quantization)
Recommended: RTX 5090 (32GB), RTX 4090 (24GB), or better
Minimum: RTX 3090 (24GB)

Quantization Details

This model uses bitsandbytes NF4 quantization with double quantization:

Method: NF4 (Normal Float 4-bit)
Compute dtype: bfloat16
Double quantization: Yes
Quality: Minimal accuracy loss compared to BF16

The quantization happens automatically when you load the model thanks to the included quantization_config.json.

Download Size vs Runtime Size

Download: ~49GB (BF16 weights, 32 shards)
Disk: ~49GB
GPU Memory: ~13GB (after automatic quantization)

The model is stored in BF16 for maximum quality, then quantized to 4-bit at load time.

Model Architecture

Layers: 30 transformer layers
Experts: 128 experts per layer (top-8 routing)
Hidden size: 2816
Attention heads: 22
Active parameters: ~3.8B per token
Total parameters: ~26B

Biological Tokens

The model includes 28,028 additional biological tokens:

DNA BPE: 20,000 tokens (optimized for genomic sequences)
Protein BPE: 8,000 tokens (optimized for amino acid sequences)
3Di alphabet: 20 tokens (Foldseek structural alphabet)
DSSP: 8 tokens (secondary structure: H, E, C, etc.)

Training Data

Source	Size	Tokens	Proportion
DNA (human genome)	8.0 GB	2.1B	24.6%
Protein (UniProt)	8.0 GB	2.1B	24.6%
Protein (LucaOne)	7.5 GB	2.0B	23.1%
OpenWebText	8.0 GB	2.1B	24.6%
Structure (3Di + DSSP)	0.4 GB	0.1B	1.2%
Instruction replay	0.6 GB	0.4B	1.9%

Other Versions

Full BF16 (no quantization): https://huggingface.co/dnagpt/OmniGene-4-CPT-v2-merged
LoRA adapter (requires base model): https://huggingface.co/dnagpt/OmniGene-4-CPT-v2
Instruction-tuned: https://huggingface.co/dnagpt/OmniGene-4-SFT-v3-4bit

Citation

@article{wang2026omnigene4,
  title={OmniGene-4: A Unified Bio-Language MoE Model with Router-Level Interpretability},
  author={Wang, Liang},
  journal={bioRxiv},
  year={2026}
}

Paper

Full paper: https://github.com/maris205/omnigene4

License

Apache 2.0

Contact

Liang Wang (wangliang.f@gmail.com)
School of Artificial Intelligence and Automation
Huazhong University of Science and Technology

Downloads last month: 20

Safetensors

Model size

26B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support