Ansu
/

HiFiGAN-Basque-Maider-Antton

Model card Files Files and versions

HiFiGAN-Basque-Maider-Antton / README.md

Ansu's picture

Update README.md

670689d verified 19 days ago

|

history blame contribute delete

2.99 kB

	---
	language:
	- eu
	base_model:
	- speechbrain/tts-hifigan-unit-hubert-l6-k100-ljspeech
	library_name: speechbrain
	---
	# Basque Unit-HiFiGAN Vocoder (Voices: Maider & Antton)
	## Model Summary

	This repository provides a Unit-HiFiGAN vocoder trained to synthesize high-fidelity Basque speech from discrete HuBERT-derived unit sequences. The model supports two speaker identities, Maider and Antton, using learned speaker-conditioning embeddings. It is compatible with HuBERT features extracted from layer 9 and clustered using a KMeans (k=1000) quantizer.

	The vocoder is designed for unit-based text-to-speech, voice conversion, and speech synthesis research in Basque. It reconstructs waveform audio from sequences of discrete unit IDs and optional speaker embeddings.

	## Key Features

	Voices: Maider and Antton

	Architecture: Unit-HiFiGAN (SpeechBrain implementation)

	Input: Discrete HuBERT units (1D sequence of cluster IDs)

	Output: 16 kHz Basque speech signal

	Speaker conditioning: Single-speaker or multi-speaker inference via speaker embeddings

	Compatible encoders: Basque-finetuned HuBERT (layer 9 hidden states → KMeans)

	Use cases: Basque TTS research, unit-based synthesis, voice conversion, controllable speaker identity

	## How to Use

	Install speechbrain:
	```
	pip install speechbrain
	```
	Below is a minimal inference example that replicates the expected workflow:

	```
	import torch
	import torchaudio
	import joblib
	import numpy as np
	from transformers import Wav2Vec2Processor, HubertModel
	from speechbrain.inference.vocoders import UnitHIFIGAN
	from huggingface_hub import hf_hub_download

	DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
	SR = 16000

	# 1. Load HuBERT
	processor = Wav2Vec2Processor.from_pretrained("Ansu/HiFiGAN-Basque-Maider-Antton")
	hubert = HubertModel.from_pretrained("Ansu/HiFiGAN-Basque-Maider-Antton").to(DEVICE).eval()

	# 2. Load KMeans
	kmeans_path = hf_hub_download("Ansu/HiFiGAN-Basque-Maider-Antton", "kmeans/basque_hubert_k1000_L9.pt")
	kmeans = joblib.load(kmeans_path)

	# 3. Load vocoder
	vocoder = UnitHIFIGAN.from_hparams(
	source="your-vocoder-repo",
	run_opts={"device": DEVICE}
	).eval()

	# 4. Load audio
	wav, sr = torchaudio.load("example.wav")
	wav = torchaudio.functional.resample(wav, sr, SR)

	# 5. HuBERT → units
	inputs = processor(wav, sampling_rate=SR, return_tensors="pt")
	inputs["input_values"] = inputs["input_values"].to(DEVICE)

	with torch.no_grad():
	hidden = hubert(**inputs, output_hidden_states=True).hidden_states[9]

	features = hidden.squeeze(0).cpu().numpy()
	unit_ids = kmeans.predict(features)
	units = torch.LongTensor(unit_ids).unsqueeze(0).unsqueeze(-1).to(DEVICE)

	# 6. Speaker embedding (Maider or Antton)
	spk_emb = torch.FloatTensor(
	np.load("speaker_embeddings/maider.npy")
	).unsqueeze(0).to(DEVICE)

	# 7. Vocoder decode
	with torch.no_grad():
	wav_out = vocoder.decode_batch(units, spk_emb=spk_emb)

	torchaudio.save("output_maider.wav", wav_out.cpu(), SR)
	print("Saved: output_maider.wav")
	```