mni-ml/transformer

A 12.3M-parameter decoder-only Transformer (GPT-style) trained in Node.js with @mni-ml/framework on the TinyStories corpus, using a HuggingFace-style ByteLevel BPE tokenizer (vocab 4096).

Source code, training scripts, and data-prep utilities live at github.com/mni-ml/transformer.

The HF inference widget is disabled for this model. It uses a custom Node.js runtime (@mni-ml/framework), not transformers, so the widget cannot load it. See Running locally below.

Files

File Size Description
model-final.json ~249 MB Final checkpoint: weights, config, and optimizer state, loaded by @mni-ml/framework
tokenizer.json ~266 KB HuggingFace-format ByteLevel BPE tokenizer (vocab 4096, special token <|endoftext|>)

Architecture

Standard GPT-style decoder-only Transformer with pre-norm blocks, causal self-attention, learnable position embeddings, and weight-tied output head.

Hyperparameter Value
Parameters 12,322,816
Layers (n_layer) 6
Attention heads (n_head) 6
Embedding dim (n_embd) 384
Head dim 64
Context window (block_size) 256 tokens
Vocab size 4,096
Activation GELU
Normalization LayerNorm (pre-norm), ε = 1e-5

The full config is also embedded in model-final.json under the config key and is read automatically by the generate scripts.

Running locally

Because this model uses a custom JS runtime, you need three pieces to run inference: the npm framework, and two source files (src/generate.js and src/bpe.js) from the GitHub repo.

Prerequisites

  • Node.js ≥ 22.18 (required by @mni-ml/framework)
  • git (to grab the source files) and hf CLI (to download the weights)

Step-by-step

# 1. Clone the source repo (needed for src/generate.js + src/bpe.js)
git clone https://github.com/mni-ml/transformer.git
cd transformer

# 2. Install the JS runtime
npm install

# 3. Download the checkpoint + tokenizer into ./out
hf download mni-ml/transformer model-final.json tokenizer.json --local-dir ./out

# 4. Generate
node src/generate.js out/model-final.json "<|endoftext|>" 400 0.9 out/tokenizer.json

CLI arguments to generate.js:

node src/generate.js <checkpoint> <prompt> <max_new_tokens> <temperature> <tokenizer_path>

⚠️ The 5th argument (tokenizer_path) is effectively required when using this public checkpoint. model-final.json internally records the path /app/data/tokenizer.json (the training container's path), which will not exist on your machine. Always pass out/tokenizer.json (or wherever you downloaded it) as the 5th arg.

Temperature 0 gives greedy decoding; values > 0 do temperature sampling. The prompt is encoded with the BPE tokenizer, so any UTF-8 string works; <\|endoftext\|> is the only special token.

GPU (optional)

If you install a matching @mni-ml/framework-* native package that exposes native.flashAttention:

node src/generate_gpu.js out/model-final.json "<|endoftext|>" 400 0.9 out/tokenizer.json

Quick sanity check

node src/generate.js out/model-final.json "Once upon a time" 100 0.8 out/tokenizer.json

Expected output style: short, simple, children's-story English (since the training corpus is TinyStories).

Intended use

Small research / educational model that demonstrates training a Transformer end-to-end in JavaScript. It is fluent on short children's-story-style English and is not a general-purpose chat or instruction model.

  • Suitable for: short-form story continuation, JS/Node learning demos, tokenizer experiments.
  • Not suitable for: factual Q&A, code generation, non-English text, long-context tasks (256-token window), safety-critical use.

Training data

TinyStories — a synthetic corpus of short English children's stories, originally generated by GPT-3.5 / GPT-4 and designed for training small language models. The BPE tokenizer in tokenizer.json was trained on the same corpus via scripts/prepare_tinystories.py in the source repo.

Training procedure

  • Framework: @mni-ml/framework v0.3.4 (Node.js)
  • Entry point: src/train.js (CPU) or src/train_gpu.js (GPU)
  • Objective: next-token cross-entropy
Hyperparameter Value
Optimizer AdamW
β₁, β₂ 0.9, 0.95
Weight decay 0.1
Max grad norm 1.0
Peak LR 3e-4
Min LR 6e-5
LR schedule Linear warmup (200 steps) → cosine decay
Max iterations 7,500
Batch size 8
Gradient accumulation 4 (→ effective batch 32)
Dropout 0.1 (training only)

Limitations and biases

  • Trained only on TinyStories, so outputs mimic simple children's stories and will hallucinate or produce nonsense for anything outside that domain.
  • TinyStories is itself GPT-generated, so any biases or artifacts of the generating models can propagate here.
  • 256-token context window is very short.
  • No RLHF, no instruction tuning, no safety alignment.
  • English-only.

License

MIT — see the source repository for details.

Citation

@misc{mni-ml-transformer,
  title  = {mni-ml/transformer: a 12M-parameter Transformer trained in Node.js},
  author = {mni-ml},
  year   = {2026},
  url    = {https://github.com/mni-ml/transformer}
}

@article{eldan2023tinystories,
  title   = {TinyStories: How Small Can Language Models Be and Still Speak Coherent English?},
  author  = {Eldan, Ronen and Li, Yuanzhi},
  journal = {arXiv preprint arXiv:2305.07759},
  year    = {2023}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train mni-ml/transformer

Paper for mni-ml/transformer