Ouroboros V24: Cognitive Architecture for Reflexive Financial Reasoning
Ouroboros V24 is the latest iteration of a cognitive architecture designed for autonomous financial decision-making. Built on a 35B-parameter Mixture-of-Experts (MoE) base model with ~3B active parameters, trained through 24 iterative rounds of multi-reward GRPO with a 54-dimensional cognitive reward topology.
⚠️ Weights are not publicly released. This model card documents the architecture and training methodology. For research collaboration inquiries, contact the author.
Architecture
Base Model
- Type: Mixture-of-Experts (MoE)
- Total Parameters: ~35B
- Active Parameters: ~3B per token
- Context Window: 32K tokens
Training Methodology
- Algorithm: R-GRPO (Reflexive Group Relative Policy Optimization)
- Training Rounds: 24 iterative cycles (V1 → V24)
- Adapter Strategy: 20-layer sequential LoRA merge chain
- Reward Architecture: SCRGNDWMT (9-tier, 54 sub-dimensions)
9-Tier Reward Topology (SCRGNDWMT)
| Tier | Name | Sub-dimensions | Description |
|---|---|---|---|
| S | Structure | 6 | XML formatting, JSON decision blocks |
| C | Content | 7 | Domain expertise, data fidelity, causal depth |
| R | Reasoning | 5 | Temporal-causal chains, counterfactual depth |
| G | Game Theory | 5 | K-level thinking, deception detection, coalition |
| N | Narrative | 4 | Scenario construction, debate, arc coherence |
| D | Data Fidelity | 3 | Numerical accuracy, source attribution |
| W | World Model | 6 | Regime detection, cross-market transmission, macro |
| M | Metacognition | 7 | Self-awareness, Bayesian confidence, falsification |
| T | Temporal-Causal | 5 | Causal chains, temporal depth, granularity |
V24 Upgrades (from V22)
- C7 (CausalChainDepthV2): Multi-step causal chains with time-lag annotations
- M7 (BayesianConfidence): Calibrated confidence field in JSON decisions
- W3 (CrossMarketPath): Structural contagion paths (Market A → Mechanism → Market B)
- M5 (FalsificationV2): Quantitative, price-based invalidation conditions
Key Training Parameters
| Parameter | Value |
|---|---|
| Learning rate | 5 × 10⁻⁷ |
| Group size | 12 |
| Max completion tokens | 1000 |
| Temperature | 1.15 |
| β-annealing | Stable (β=0.05) ↔ Break-up (β=0.03) |
| LoRA rank | ≥ 10 |
Key Results
Reflexive Intelligence Emergence
During V17 training, reflexive reasoning emerged through a discontinuous phase transition at Step 153 — after 150+ steps of zero reflexivity scores, the capability appeared spontaneously and sustained. This is documented in Papers 1-3 of the research program.
V24 Training (ongoing)
- 54-dimensional reward actively guiding cognitive development
- Bayesian confidence calibration observed from Step 18
- Cross-market causal reasoning emerging by Step 25
- Zero gradient failures through 55+ steps
Research Program
This model is part of a six-paper research program:
| Paper | Title | DOI |
|---|---|---|
| P1 | Reflexive Intelligence in LLMs | 10.5281/zenodo.19557261 |
| P2 | Observer Depth (ReflexBench) | 10.5281/zenodo.19627242 |
| P3 | When Rewards Collide (Multi-Reward GRPO) | 10.5281/zenodo.19665969 |
| P4 | Ouroboros V22 Architecture | 10.5281/zenodo.19666786 |
| P5 | The Cognitive Lifecycle | 10.5281/zenodo.19666806 |
| P6 | Cognitive Reward Topology | 10.5281/zenodo.19666829 |
Related Resources
| Resource | Link |
|---|---|
| ReflexBench Dataset | MMJBDS/reflexbench |
| ReflexBench Eval Results | MMJBDS/reflexbench-eval |
| Papers Repository | github.com/mmjbds/ouroboros-papers |
| Evaluation Code | github.com/mmjbds/reflexbench |
Citation
@article{zhang2026ouroborosv22,
title={Ouroboros V22: Bayesian Scenario Simulation and Recurrent Depth Cognition},
author={Zhang, Mian},
year={2026},
doi={10.5281/zenodo.19666786}
}
@article{zhang2026topology,
title={Cognitive Reward Topology: A Nine-Tier Architecture for Multi-Reward GRPO},
author={Zhang, Mian},
year={2026},
doi={10.5281/zenodo.19666829}
}
Author
- Mian Zhang — Independent AI Researcher
- ORCID: 0009-0001-9556-3839
- Email: 373743743@qq.com
- GitHub: @mmjbds
- Twitter/X: @Henry_Avery666
- LinkedIn: henryavery-mianzhang
License
This model card is released under CC BY 4.0. Model weights are not publicly available.