Automatic Speech Recognition
NeMo
Safetensors
English
parakeet
whisper
qwen3
ctranslate2
text-generation
air-traffic-control
atc
singapore
military
Instructions to use aether-raid/astra-atc-models with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use aether-raid/astra-atc-models with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("aether-raid/astra-atc-models") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
File size: 4,399 Bytes
319d77e 01f9953 319d77e f338e91 319d77e 01f9953 f338e91 01f9953 319d77e 01f9953 f338e91 319d77e 01f9953 319d77e 01f9953 f338e91 319d77e 01f9953 9dd7a6b 01f9953 f338e91 01f9953 319d77e 01f9953 319d77e f338e91 319d77e f338e91 319d77e 01f9953 319d77e f338e91 01f9953 319d77e 01f9953 319d77e f338e91 6d47469 9dd7a6b f338e91 01f9953 319d77e f338e91 319d77e 01f9953 319d77e 01f9953 319d77e 01f9953 319d77e 01f9953 319d77e 01f9953 319d77e f338e91 319d77e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 | ---
language:
- en
license: other
tags:
- nemo
- parakeet
- whisper
- qwen3
- ctranslate2
- automatic-speech-recognition
- text-generation
- air-traffic-control
- atc
- singapore
- military
pipeline_tag: automatic-speech-recognition
---
# ASTRA ATC Models
Fine-tuned models for Singapore military air traffic control, built for the [ASTRA](https://github.com/aether-raid) training simulator.
## Pipeline
```text
Audio --> VAD (Silero) --> ASR (Whisper or Parakeet) --> Rule Formatter --> Display Text
"camel climb flight level zero nine zero"
"CAMEL climb FL090"
```
The production pipeline uses a deterministic rule-based formatter instead of the legacy LLM formatter.
## Models
### [ASR/whisper/](./ASR/whisper) - Whisper Large v3 (Legacy CTranslate2 backend)
Fine-tuned for Singapore military ATC speech. Uses CTranslate2 float16 format for fast inference with [faster-whisper](https://github.com/SYSTRAN/faster-whisper).
| Metric | Value |
|--------|-------|
| WER | **0.66%** |
| Base model | `openai/whisper-large-v3` |
| Size | 2.9 GB |
| Runtime | `faster-whisper` / CTranslate2 |
### [ASR/parakeet/](./ASR/parakeet) - Parakeet-TDT 0.6B v2 (NeMo checkpoint)
Fine-tuned NeMo Parakeet model for Singapore military ATC speech. Published as a raw checkpoint together with the tokenizer artifacts required to restore it.
| Metric | Value |
|--------|-------|
| Validation WER | **0.72%** |
| Base model | `nvidia/parakeet-tdt-0.6b-v2` |
| Size | 7.0 GB |
| Runtime | `nemo_toolkit[asr]` |
### [LLM/](./LLM) - Qwen3-1.7B Display Formatter (Legacy)
> **Legacy.** Superseded by the deterministic rule formatter. Retained for reference only.
Converts normalized ASR output into structured ATC display text.
| Metric | Value |
|--------|-------|
| Exact match | **100%** (161/161) |
| Base model | `unsloth/Qwen3-1.7B` |
| Size | 3.3 GB |
## Architecture
```text
Audio --> VAD (Silero) --> ASR backend --> Post-processing --> Rule Formatter --> Display Text
```
| Component | Technology | Notes |
|-----------|------------|-------|
| VAD | Silero VAD | Shared frontend for both ASR backends |
| ASR (legacy) | Whisper Large v3 (CTranslate2) | Lower-memory legacy backend |
| ASR (current NeMo path) | Parakeet-TDT 0.6B v2 | Fine-tuned NeMo checkpoint |
| Formatter | Deterministic rules | Converts normalized speech to ATC display text |
## Domain
Singapore military ATC covering Tengah and Paya Lebar operations, military phraseology, 100+ callsigns, and approach / recovery / emergency traffic.
## Training History
### ASR
| Run | WER | Base | Key Change |
|-----|-----|------|------------|
| ct2_run5 | 0.48% | jacktol/whisper-large-v3-finetuned-for-ATC | Initial fine-tune |
| ct2_run6 | 0.40% | jacktol/whisper-large-v3-finetuned-for-ATC | +augmentation, weight decay |
| ct2_run7 | 0.24% | jacktol/whisper-large-v3-finetuned-for-ATC | Frozen encoder, +50 real recordings |
| ct2_run8 | 0.66% | openai/whisper-large-v3 | Full retrain from base, enhanced augmentation |
| parakeet_atc | 0.72% | nvidia/parakeet-tdt-0.6b-v2 | NeMo fine-tune with ATC radio augmentation, best checkpoint at epoch 76 |
### LLM
| Run | Accuracy | Key Change |
|-----|----------|------------|
| llm_run3 | 98.1% (Qwen3-8B) | QLoRA 4-bit, 871 examples |
| llm_run4 | 100% (Qwen3-1.7B) | bf16 LoRA, 1,915 examples with ASR noise augmentation |
## Quick Start
### Whisper ASR
```python
from faster_whisper import WhisperModel
model = WhisperModel("./ASR/whisper", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.wav", language="en", beam_size=5)
text = " ".join(seg.text.strip() for seg in segments)
```
### Parakeet ASR
See [ASR/parakeet/README.md](./ASR/parakeet/README.md) for the NeMo restore example and tokenizer artifact requirements.
## Download
```bash
# Full repo
huggingface-cli download aether-raid/astra-atc-models --local-dir ./models
# Whisper ASR only
huggingface-cli download aether-raid/astra-atc-models --include "ASR/whisper/*" --local-dir ./models
# Parakeet ASR only
huggingface-cli download aether-raid/astra-atc-models --include "ASR/parakeet/*" --local-dir ./models
# LLM only (legacy)
huggingface-cli download aether-raid/astra-atc-models --include "LLM/*" --local-dir ./models
```
|