ik commited on
Commit
ccf210d
·
verified ·
1 Parent(s): 30a1124

Upload RVQ Stage-1 (2025-09-02)

Browse files
README.md ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - tw
5
+ - ak
6
+ library_name: pytorch
7
+ tags:
8
+ - speechless
9
+ - rvq
10
+ - whisper
11
+ - twi
12
+ - akan
13
+ - vector-quantization
14
+ - semantic-tokens
15
+ ---
16
+
17
+ # Speechless TWI — Stage 1 (RVQ for Whisper Encoder)
18
+
19
+ Trained RVQ that discretizes Whisper encoder features into semantic tokens for **Twi/Akan**.
20
+
21
+ ## Files
22
+ - `rvq_final.pt` — state dict
23
+ - `config_stage1.json` — training/config params
24
+ - `rvq_wrapper.py` — tiny module defining `RVQWrapper`
25
+
26
+ ## Usage (example)
27
+ ```python
28
+ import torch, json
29
+ from huggingface_hub import hf_hub_download
30
+ from rvq_wrapper import RVQWrapper
31
+
32
+ cfg = json.load(open(hf_hub_download("ik/speechless-twi-stage1-rvq-whisper-medium", "config_stage1.json"), "r"))
33
+ ckpt = torch.load(hf_hub_download("ik/speechless-twi-stage1-rvq-whisper-medium", "rvq_final.pt"), map_location="cpu")
34
+
35
+ rvq = RVQWrapper(cfg["rvq_dim"], cfg["rvq_num_quantizers"], cfg["rvq_codebook_size"])
36
+ rvq.load_state_dict(ckpt["rvq"])
37
+ rvq.eval()
38
+ ```
config_stage1.json ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cv_version": "fsicoli/twi",
3
+ "cv_lang": "tw",
4
+ "split_train": "train",
5
+ "split_eval": "validation",
6
+ "sample_rate": 16000,
7
+ "max_audio_seconds": 30.0,
8
+ "whisper_ckpt": "openai/whisper-medium",
9
+ "batch_size": 2,
10
+ "num_workers": 0,
11
+ "rvq_dim": 1024,
12
+ "rvq_num_quantizers": 8,
13
+ "rvq_codebook_size": 2048,
14
+ "rvq_commitment_weight": 0.25,
15
+ "lr": 0.001,
16
+ "epochs": 1,
17
+ "warmup_steps": 200,
18
+ "save_every": 1000
19
+ }
rvq_final.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8e35cc1e8ec5b18acb49e147a4d383c1ba725fe4ab09ca2349b72ad71f14de6b
3
+ size 142760458
rvq_step1000.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c4b1696a3d4c000aecdad7907e6072a4d0313c9ed2927f2c201867f83f5a8b00
3
+ size 142760587
rvq_wrapper.py ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ import torch, torch.nn as nn
3
+ from vector_quantize_pytorch import ResidualVQ
4
+
5
+ class RVQWrapper(nn.Module):
6
+ def __init__(self, dim, num_quantizers, codebook_size):
7
+ super().__init__()
8
+ self.ln_in = nn.LayerNorm(dim)
9
+ self.proj_in = nn.Linear(dim, dim)
10
+ self.rvq = ResidualVQ(dim=dim, num_quantizers=num_quantizers, codebook_size=codebook_size)
11
+ self.ln_out = nn.LayerNorm(dim)
12
+ self.proj_out = nn.Linear(dim, dim)
13
+
14
+ def forward(self, x):
15
+ x = self.proj_in(self.ln_in(x))
16
+ q, indices, commit_loss = self.rvq(x)
17
+ y = self.proj_out(self.ln_out(q))
18
+ return y, indices, commit_loss
twi_semantic_tokens_sample.json ADDED
The diff for this file is too large to render. See raw diff
 
twi_semantic_tokens_sample.jsonl ADDED
The diff for this file is too large to render. See raw diff