728 57 497

mrfakename PRO

mrfakename

https://mrfake.name/

AI & ML interests

LLMs, TTS, & Open Source

Recent Activity

new activity about 12 hours ago

mrfakename/MegaTTS3-Voice-Cloning:pricing

updated a dataset 1 day ago

mrfakename/laion-audio-small-processed

published a dataset 1 day ago

mrfakename/laion-audio-small-processed

View all activity

Organizations

replied to their post 2 months ago

Unfortunately not all models support this so I'm not sure this would be feasbile

posted an update 3 months ago

Post

5786

Hi everyone,

I just launched TTS Arena V2 - a platform for benchmarking TTS models by blind A/B testing. The goal is to make it easy to compare quality between open-source and commercial models, including conversational ones.

What's new in V2:

- **Conversational Arena**: Evaluate models like CSM-1B, Dia 1.6B, and PlayDialog in multi-turn settings
- **Personal Leaderboard**: Optional login to see which models you tend to prefer
- **Multi-speaker TTS**: Random voices per generation to reduce speaker bias
- **Performance Upgrade**: Rebuilt from Gradio → Flask. Much faster with fewer failed generations.
- **Keyboard Shortcuts**: Vote entirely via keyboard

Also added models like MegaTTS 3, Cartesia Sonic, and ElevenLabs' full lineup.

I'd love any feedback, feature suggestions, or ideas for models to include.

TTS-AGI/TTS-Arena-V2

6 replies

posted an update 4 months ago

Post

2977

Papla P1 from Papla Media is now available on the TTS Arena!

Try out Papla's new ultra-realistic TTS model + compare it with other leading models on the TTS Arena: TTS-AGI/TTS-Arena

posted an update 5 months ago

Post

2877

GGUF quants (text-only) for the new Mistral Small 3.1 24B are now live:

mrfakename/mistral-small-3.1-24b-instruct-2503-gguf

posted an update 5 months ago

Post

2349

Converted the new Mistral Small 3.1 models to HF format (currently text-only, no vision):

Instruct: mrfakename/mistral-small-3.1-24b-instruct-2503-hf
Base: mrfakename/mistral-small-3.1-24b-base-2503-hf

GGUF quants coming soon!

posted an update 6 months ago

Post

2627

I’m excited to introduce a new leaderboard UI + keyboard shortcuts on the TTS Arena!

The refreshed UI for the leaderboard is smoother and (hopefully) more intuitive. You can now view models based on a simpler win-rate percentage and exclude closed models.

In addition, the TTS Arena now supports keyboard shortcuts. This should make voting much more efficient as you can now vote without clicking anything!

In both the normal Arena and Battle Mode, press "r" to select a random text, Cmd/Ctrl + Enter to synthesize, and "a"/"b" to vote! View more details about keyboard shortcuts by pressing "?" (Shift + /) on the Arena.

Check out all the new updates on the TTS Arena:

TTS-AGI/TTS-Arena

1 reply

reacted to nyuuzyou's post with 🤯 8 months ago

Post

3012

its over

41 replies

replied to their post 8 months ago

Hi, do you see a limit in the number of voices I have 416 and it fails to load all of them. (scroll menu limit?)

I'm not sure if there's a set limit for the dropdown, but with that many voices, it might make sense to not use the dropdown but instead have a textbox to specify the path to the reference speaker.

replied to their post 8 months ago

I don't think that's supported by the model, but you could fine-tune it or clone a voice with emotions. (I am not the author of the model itself, just of the web demo)

replied to their post 8 months ago

Hi,
You can upload a WAV file to the voices folder. Then, in the app.py file, add the filename of the voice (without .wav) to the voicelist list. It should show up in the Gradio demo.

replied to their post 9 months ago

Hi,
I added:

import nltk
nltk.download('punkt_tab')

and it seems to resolve the issue for me. Have you changed any code from the original Space?
Thanks!

replied to their post 9 months ago

This comment has been hidden

replied to their post 9 months ago

Hi,
Sorry about the issues! Please try adding:

nltk.download('punkt_tab')

below the nltk.download() line – let me know if it works!

posted an update 10 months ago

Post

7850

I just released an unofficial demo for Moonshine ASR!

Moonshine is a fast, efficient, & accurate ASR model released by Useful Sensors. It's designed for on-device inference and licensed under the MIT license!

HF Space (unofficial demo): mrfakename/Moonshine
GitHub repo for Moonshine: https://github.com/usefulsensors/moonshine

replied to their post 10 months ago

Training itself would be pretty easy, but the main issue would be data. AFAIK there's not much data out there for other TTS models. I synthetically generated the StyleTTS 2 dataset as it's quite efficient but other models would require much more compute.

reacted to Jofthomas's post with 🔥 12 months ago

Post

7659

Everchanging Quest is out !

It is an LLM controlled Rogue-Like in which the LLM gets a markdown representation of the map, and should generate a JSON with the objective to fulfill on the map as well as the necessary objects and their placements.

Come test it on the space :
Jofthomas/Everchanging-Quest

2 replies

reacted to cdminix's post with 👍 about 1 year ago

Post

2253

Since new TTS (Text-to-Speech) systems are coming out what feels like every day, and it's currently hard to compare them, my latest project has focused on doing just that.

I was inspired by the TTS-AGI/TTS-Arena (definitely check it out if you haven't), which compares recent TTS system using crowdsourced A/B testing.

I wanted to see if we can also do a similar evaluation with objective metrics and it's now available here:
ttsds/benchmark
Anyone can submit a new TTS model, and I hope this can provide a way to get some information on which areas models perform well or poorly in.

The paper with all the details is available here: https://arxiv.org/abs/2407.12707