Miczu's picture

1 1 1

Miczu

Miczu

AI & ML interests

None yet

Recent Activity

upvoted an article about 3 hours ago

G2P Shrinks Speech Models

reacted to hexgrad's post with 🔥 about 3 hours ago

Wanted: Peak Data. I'm collecting audio data to train another TTS model: + AVM data: ChatGPT Advanced Voice Mode audio & text from source + Professional audio: Permissive (CC0, Apache, MIT, CC-BY) This audio should *impress* most native speakers, not just barely pass their audio Turing tests. Professional-caliber means S or A-tier, not your average bloke off the street. Traditional TTS may not make the cut. Absolutely no low-fi microphone recordings like Common Voice. The bar is much higher than last time, so there are no timelines yet and I expect it may take longer to collect such mythical data. Raising the bar means evicting quite a bit of old data, and voice/language availability may decrease. The theme is *quality* over quantity. I would rather have 1 hour of A/S-tier than 100 hours of mid data. I have nothing to offer but the north star of a future Apache 2.0 TTS model, so prefer data that you *already have* and costs you *nothing extra* to send. Additionally, *all* the new data may be used to construct public, Apache 2.0 voicepacks, and if that arrangement doesn't work for you, no need to send any audio. Last time I asked for horses; now I'm asking for unicorns. As of writing this post, I've currently got a few English & Chinese unicorns, but there is plenty of room in the stable. Find me over on Discord at `rzvzn`: https://discord.gg/QuGxSWBfQy

new activity 9 days ago

hexgrad/Kokoro-82M:Is v1.0 going to get onnx formatted models?

View all activity

Organizations

None yet

Miczu's activity

liked a model 10 days ago

hexgrad/Kokoro-82M

Text-to-Speech • Updated 8 days ago • 271k • 2.96k