File size: 4,399 Bytes
319d77e
 
 
 
 
01f9953
 
319d77e
 
 
 
 
 
 
 
 
 
 
 
 
 
f338e91
 
 
319d77e
01f9953
 
f338e91
01f9953
319d77e
 
01f9953
f338e91
319d77e
 
01f9953
319d77e
 
 
 
 
01f9953
f338e91
319d77e
01f9953
 
 
 
 
 
 
 
9dd7a6b
01f9953
 
 
f338e91
01f9953
319d77e
01f9953
319d77e
f338e91
319d77e
 
 
 
 
 
 
f338e91
319d77e
01f9953
 
319d77e
f338e91
01f9953
 
 
 
 
 
319d77e
 
 
01f9953
319d77e
 
 
 
 
f338e91
 
 
 
 
6d47469
9dd7a6b
f338e91
01f9953
319d77e
 
 
 
f338e91
319d77e
 
 
01f9953
319d77e
 
 
 
01f9953
319d77e
 
 
 
01f9953
 
 
 
 
319d77e
 
01f9953
319d77e
 
01f9953
 
 
 
 
319d77e
f338e91
319d77e
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
language:
  - en
license: other
tags:
  - nemo
  - parakeet
  - whisper
  - qwen3
  - ctranslate2
  - automatic-speech-recognition
  - text-generation
  - air-traffic-control
  - atc
  - singapore
  - military
pipeline_tag: automatic-speech-recognition
---

# ASTRA ATC Models

Fine-tuned models for Singapore military air traffic control, built for the [ASTRA](https://github.com/aether-raid) training simulator.

## Pipeline

```text
Audio  -->  VAD (Silero)  -->  ASR (Whisper or Parakeet)  -->  Rule Formatter  -->  Display Text
                               "camel climb flight level zero nine zero"
                                                                                  "CAMEL climb FL090"
```

The production pipeline uses a deterministic rule-based formatter instead of the legacy LLM formatter.

## Models

### [ASR/whisper/](./ASR/whisper) - Whisper Large v3 (Legacy CTranslate2 backend)

Fine-tuned for Singapore military ATC speech. Uses CTranslate2 float16 format for fast inference with [faster-whisper](https://github.com/SYSTRAN/faster-whisper).

| Metric | Value |
|--------|-------|
| WER | **0.66%** |
| Base model | `openai/whisper-large-v3` |
| Size | 2.9 GB |
| Runtime | `faster-whisper` / CTranslate2 |

### [ASR/parakeet/](./ASR/parakeet) - Parakeet-TDT 0.6B v2 (NeMo checkpoint)

Fine-tuned NeMo Parakeet model for Singapore military ATC speech. Published as a raw checkpoint together with the tokenizer artifacts required to restore it.

| Metric | Value |
|--------|-------|
| Validation WER | **0.72%** |
| Base model | `nvidia/parakeet-tdt-0.6b-v2` |
| Size | 7.0 GB |
| Runtime | `nemo_toolkit[asr]` |

### [LLM/](./LLM) - Qwen3-1.7B Display Formatter (Legacy)

> **Legacy.** Superseded by the deterministic rule formatter. Retained for reference only.

Converts normalized ASR output into structured ATC display text.

| Metric | Value |
|--------|-------|
| Exact match | **100%** (161/161) |
| Base model | `unsloth/Qwen3-1.7B` |
| Size | 3.3 GB |

## Architecture

```text
Audio --> VAD (Silero) --> ASR backend --> Post-processing --> Rule Formatter --> Display Text
```

| Component | Technology | Notes |
|-----------|------------|-------|
| VAD | Silero VAD | Shared frontend for both ASR backends |
| ASR (legacy) | Whisper Large v3 (CTranslate2) | Lower-memory legacy backend |
| ASR (current NeMo path) | Parakeet-TDT 0.6B v2 | Fine-tuned NeMo checkpoint |
| Formatter | Deterministic rules | Converts normalized speech to ATC display text |

## Domain

Singapore military ATC covering Tengah and Paya Lebar operations, military phraseology, 100+ callsigns, and approach / recovery / emergency traffic.

## Training History

### ASR

| Run | WER | Base | Key Change |
|-----|-----|------|------------|
| ct2_run5 | 0.48% | jacktol/whisper-large-v3-finetuned-for-ATC | Initial fine-tune |
| ct2_run6 | 0.40% | jacktol/whisper-large-v3-finetuned-for-ATC | +augmentation, weight decay |
| ct2_run7 | 0.24% | jacktol/whisper-large-v3-finetuned-for-ATC | Frozen encoder, +50 real recordings |
| ct2_run8 | 0.66% | openai/whisper-large-v3 | Full retrain from base, enhanced augmentation |
| parakeet_atc | 0.72% | nvidia/parakeet-tdt-0.6b-v2 | NeMo fine-tune with ATC radio augmentation, best checkpoint at epoch 76 |

### LLM

| Run | Accuracy | Key Change |
|-----|----------|------------|
| llm_run3 | 98.1% (Qwen3-8B) | QLoRA 4-bit, 871 examples |
| llm_run4 | 100% (Qwen3-1.7B) | bf16 LoRA, 1,915 examples with ASR noise augmentation |

## Quick Start

### Whisper ASR

```python
from faster_whisper import WhisperModel

model = WhisperModel("./ASR/whisper", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.wav", language="en", beam_size=5)
text = " ".join(seg.text.strip() for seg in segments)
```

### Parakeet ASR

See [ASR/parakeet/README.md](./ASR/parakeet/README.md) for the NeMo restore example and tokenizer artifact requirements.

## Download

```bash
# Full repo
huggingface-cli download aether-raid/astra-atc-models --local-dir ./models

# Whisper ASR only
huggingface-cli download aether-raid/astra-atc-models --include "ASR/whisper/*" --local-dir ./models

# Parakeet ASR only
huggingface-cli download aether-raid/astra-atc-models --include "ASR/parakeet/*" --local-dir ./models

# LLM only (legacy)
huggingface-cli download aether-raid/astra-atc-models --include "LLM/*" --local-dir ./models
```