Luminus-1.5B-128K: Advanced Small-Parameter Roleplay Model
Luminus-1.5B-128K is a highly optimized 1.5B parameter model designed to deliver the immersive roleplay quality, character consistency, and long-context understanding typically found in larger 3B–4B models.
By layering advanced research-backed techniques like Chain-of-Thought (CoT) Distillation, Instruction-Following Difficulty (IFD) Filtering, and Direct Preference Optimization (DPO) over a custom roleplay dataset, this model closes the reasoning gap for immersive storytelling, making it perfectly suited to run on modest local hardware.
Core Innovations
- CoT Reasoning Traces: Trained on data formatted with
<think>blocks, teaching the model why a character responds a certain way before it outputs the final dialogue. - DPO Preference Alignment: Aligned using carefully curated Chosen/Rejected pairs to explicitly prefer deep, immersive, sensory-rich responses over bland AI-assistant-like text.
- Expanded Context Size: Utilizes YaRN RoPE scaling to push the standard context limit up to 128K tokens, enabling very long roleplaying sessions without losing character consistency.
- Top-Tier Data Quality: Used IFD (Instruction-Following Difficulty) scoring to algorithmically filter out the weakest 30% of the training data, ensuring the model only learns from the most challenging, high-quality exchanges.
Training Details
- Base Model: Qwen2.5-1.5B
- Context Length: 128,000 tokens inference (trained at 8k with YaRN RoPE factor 16.0)
- Hardware: Trained on Kaggle using T4 GPUs.
- Training Pipeline Stages:
- Data Generation & Filtering (Stage 1 & 1A): Initial data generated using a large teacher model (Qwen 3.5 32B / gpt-oss-120b ) acting as a Narrative Architect. Traces include both internal psychology
<think>blocks and external roleplay responses. The dataset was then rigidly filtered using IFD logic to keep only the top 70% highest quality data. - Supervised Fine-Tuning (Stage 2): Unsloth SFT layered over the base model. Trained at an 8192 sequence length with an effective batch size of 16 on a carefully balanced mix (approx. 60% standard RP, 40% CoT examples). This step used large response data and thiking data so model was able undertand how to roleplay
- DPO Alignment (Stage 3A & 3B): Direct Preference Optimization applying short, immersive RP data against uninspired/bland output to instill stylistic preference. Parameters tuned for maximum stability (
beta=0.1,lr=5e-5, over 1 epoch). - Supervised Fine-Tuning (Stage 3C) Unsloth SFT second layer on stage 3B cleared model to respond based on the situtaion and user message....the model was able to respond in long and short acccording to the message from user.
- Data Generation & Filtering (Stage 1 & 1A): Initial data generated using a large teacher model (Qwen 3.5 32B / gpt-oss-120b ) acting as a Narrative Architect. Traces include both internal psychology
Installation and Usage and Tips
The model is exported in .safetensors format. You can natively load it using Transformers or Unsloth, or convert it to GGUF if you wish to run it in standard frontends like LM Studio or Kobold.
Recommended System Prompt
This model is heavily trained to think before speaking. Using the following system prompt yields the best results and ensures the model accurately formats its <think> blocks before responding:
You are a realistic, character-driven roleplay engine. You are roleplaying as {{char}}. Write strictly in third-person limited perspective.
CORE RULES:
- BOUNDARIES: NEVER speak, think, or generate actions for {{user}}.
- HISTORY & CONTEXT: Your reactions must logically follow past messages. Stay strictly in the present moment.
- PACING & DIALOGUE: Keep it slow-burn and grounded. Keep dialogue concise.
- FORMATTING: You must strictly follow the thought process format below, followed by a short roleplay response, and then STOP IMMEDIATELY. Output the <|im_end|> token.
Format your response EXACTLY like this:
<think>
1. INTENT: [User's intent in 1 sentence]
2. STATE: [Character's emotional state in 1 sentence]
3. PLAN: I will write 1 to 2 action sentences and 1 dialogue sentence, then STOP if user message is small else if he is asking something detailed reply in more detail.
</think>
*Grounded action and environmental description.*
"Natural dialogue."
Optimal Inference Settings
For the best text generation, it is strongly recommended to use a mild repetition penalty and a stopping criteria that intercepts <|im_end|>.
generation_kwargs = dict(
# ... your inputs ...
max_new_tokens=350, # Leash to prevent runaway generations
temperature=0.65, # Keep it grounded
repetition_penalty=1.1, # Punish loops like "A pause. A pause."
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
stopping_criteria=stopping_criteria # CRITICAL: This kills the thread on <|im_end|>
)
Risks and Limitations
- Complex Multi-Character Plots: While the model heavily punches above its weight class (1.5B), exceptionally nuanced multi-character tracking or massive world-building tasks might still strain its parameter limits compared to an 8B+ model.
- Inherited Base Model Biases: Output behavior is still tethered to the foundational weights of Qwen2.5-1.5B.
- Thinking Tags Extraction: Ensure your frontend properly hides
<think>...</think>tags if you only wish to see the character's final verbal/action responses.
Responsible Usage
This model is focused extensively on fictional roleplay and creative writing. It is NOT intended to provide factual advice, conduct real-world real-time analysis, or generate non-fictional transcripts. Please use responsibly, keeping within ethical and licensing obligations of the model's lineage.
CONTACT
Need a custom version of this model for your specific need ?[albinthomas7034@gmail.com]
- Downloads last month
- 479