TTS AGI

Enterprise

community

Verified

https://ttsarena.org

TTS-AGI

Activity Feed

AI & ML interests

Decentralized yet unified efforts to accelerate research for Open Text to Speech (TTS) systems!

Recent Activity

thomwolf authored a paper 17 days ago

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

pyp1 authored a paper about 1 month ago

ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models

thomwolf authored a paper 2 months ago

SmolVLM: Redefining small and efficient multimodal models

View all activity

TTS-AGI's activity

mrfakename

in TTS-AGI/TTS-Arena-V2 about 14 hours ago

How to add non-opensource model for pre-release evaluation?

#103 opened about 15 hours ago by

aluminumbox

multimodalart

posted an update 1 day ago

Post

1615

Self-Forcing - a real-time video distilled model from Wan 2.1 by @adobe is out, and they open sourced it 🐐

I've built a live real time demo on Spaces 📹💨

multimodalart/self-forcing

1 reply

reach-vb

posted an update 8 days ago

Post

1879

Excited to onboard FeatherlessAI on Hugging Face as an Inference Provider - they bring a fleet of 6,700+ LLMs on-demand on the Hugging Face Hub 🤯

Starting today, you'd be able to access all those LLMs (OpenAI compatible) on HF model pages and via OpenAI client libraries too! 💥

Go, play with it today: https://huggingface.co/blog/inference-providers-featherless

P.S. They're also bringing on more GPUs to support all your concurrent requests!

cbensimon

posted an update 9 days ago

Post

3007

🚀 ZeroGPU now supports PyTorch native quantization via torchao

While it hasn’t been battle-tested yet, Int8WeightOnlyConfig is already working flawlessly in our tests.

Let us know if you run into any issues — and we’re excited to see what the community will build!

import spaces
from diffusers import FluxPipeline
from torchao.quantization.quant_api import Int8WeightOnlyConfig, quantize_

pipeline = FluxPipeline.from_pretrained(...).to('cuda')
quantize_(pipeline.transformer, Int8WeightOnlyConfig()) # Or any other component(s)

@spaces.GPU
def generate(prompt: str):
    return pipeline(prompt).images[0]

5 replies

thomwolf

authored a paper 17 days ago

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published 18 days ago • 97

pyp1

authored a paper about 1 month ago

ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models

Paper • 2505.13444 • Published May 19 • 16

reach-vb

posted an update about 1 month ago

Post

3927

hey hey @mradermacher - VB from Hugging Face here, we'd love to onboard you over to our optimised xet backend! 💥

as you know we're in the process of upgrading our storage backend to xet (which helps us scale and offer blazingly fast upload/ download speeds too): https://huggingface.co/blog/xet-on-the-hub and now that we are certain that the backend can scale with even big models like Llama 4/ Qwen 3 - we;re moving to the next phase of inviting impactful orgs and users on the hub over as you are a big part of the open source ML community - we would love to onboard you next and create some excitement about it in the community too!

in terms of actual steps - it should be as simple as one of the org admins to join hf.co/join/xet - we'll take care of the rest.

p.s. you'd need to have a the latest hf_xet version of huggingface_hub lib but everything else should be the same: https://huggingface.co/docs/hub/storage-backends#using-xet-storage

p.p.s. this is fully backwards compatible so everything will work as it should! 🤗

16 replies

clefourrier

posted an update about 1 month ago

Post

721

Always surprised that so few people actually read the FineTasks blog, on
✨how to select training evals with the highest signal✨

If you're serious about training models without wasting compute on shitty runs, you absolutely should read it!!

An high signal eval actually tells you precisely, during training, how wel & what your model is learning, allowing you to discard the bad runs/bad samplings/...!

The blog covers in depth prompt choice, metrics, dataset, across languages/capabilities, and my fave section is "which properties should evals have"👌
(to know on your use case how to select the best evals for you)

Blog: HuggingFaceFW/blogpost-fine-tasks

2 replies

cbensimon

posted an update about 1 month ago

Post

5791

🚀 ZeroGPU medium size is now available as a power-user feature

Nothing too fancy for now—ZeroGPU Spaces still default to large (70GB VRAM)—but this paves the way for:
- 💰 size-based quotas / pricing (medium will offer significantly more usage than large)
- 🦣 the upcoming xlarge size (141GB VRAM)

You can as of now control GPU size via a Space variable. Accepted values:
- auto (future default)
- medium
- large (current default)

The auto mode checks total CUDA tensor size during startup:
- More than 30GB → large
- Otherwise → medium

3 replies

mrfakename

posted an update about 2 months ago

Post

3824

Hi everyone,

I just launched TTS Arena V2 - a platform for benchmarking TTS models by blind A/B testing. The goal is to make it easy to compare quality between open-source and commercial models, including conversational ones.

What's new in V2:

- **Conversational Arena**: Evaluate models like CSM-1B, Dia 1.6B, and PlayDialog in multi-turn settings
- **Personal Leaderboard**: Optional login to see which models you tend to prefer
- **Multi-speaker TTS**: Random voices per generation to reduce speaker bias
- **Performance Upgrade**: Rebuilt from Gradio → Flask. Much faster with fewer failed generations.
- **Keyboard Shortcuts**: Vote entirely via keyboard

Also added models like MegaTTS 3, Cartesia Sonic, and ElevenLabs' full lineup.

I'd love any feedback, feature suggestions, or ideas for models to include.

TTS-AGI/TTS-Arena-V2

4 replies

thomwolf

posted an update 2 months ago

Post

5237

If you've followed the progress of robotics in the past 18 months, you've likely noticed how robotics is increasingly becoming the next frontier that AI will unlock.

At Hugging Face—in robotics and across all AI fields—we believe in a future where AI and robots are open-source, transparent, and affordable; community-built and safe; hackable and fun. We've had so much mutual understanding and passion working with the Pollen Robotics team over the past year that we decided to join forces!

You can already find our open-source humanoid robot platform Reachy 2 on the Pollen website and the Pollen community and people here on the hub at

pollen-robotics

We're so excited to build and share more open-source robots with the world in the coming months!

1 reply

thomwolf

authored a paper 2 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 191

pcuenq

authored a paper 2 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 191

reach-vb

authored a paper 2 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 191

thomwolf

authored a paper 2 months ago

YourBench: Easy Custom Evaluation Sets for Everyone

Paper • 2504.01833 • Published Apr 2 • 21

mrfakename

posted an update 3 months ago

Post

2896

Papla P1 from Papla Media is now available on the TTS Arena!

Try out Papla's new ultra-realistic TTS model + compare it with other leading models on the TTS Arena: TTS-AGI/TTS-Arena

thomwolf

posted an update 3 months ago

Post

3542

The new DeepSite space is really insane for vibe-coders
enzostvs/deepsite

With the wave of vibe-coding-optimized LLMs like the latest open-source DeepSeek model (version V3-0324), you can basically prompt out-of-the-box and create any app and game in one-shot.

It feels so powerful to me, no more complex framework or under-the-hood prompt engineering to have a working text-to-app tool.

AI is eating the world and *open-source* AI is eating AI itself!

PS: and even more meta is that the DeepSite app and DeepSeek model are both fully open-source code => time to start recursively improve?

PPS: you still need some inference hosting unless you're running the 600B param model at home, so check the very nice list of HF Inference Providers for this model: deepseek-ai/DeepSeek-V3-0324

1 reply

osanseviero

authored a paper 3 months ago

Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25 • 52

mrfakename

posted an update 3 months ago

Post

2847

GGUF quants (text-only) for the new Mistral Small 3.1 24B are now live:

mrfakename/mistral-small-3.1-24b-instruct-2503-gguf

mrfakename

posted an update 3 months ago

Post

2325

Converted the new Mistral Small 3.1 models to HF format (currently text-only, no vision):

Instruct: mrfakename/mistral-small-3.1-24b-instruct-2503-hf
Base: mrfakename/mistral-small-3.1-24b-base-2503-hf

GGUF quants coming soon!

AI & ML interests

Recent Activity

Team members 17

TTS-AGI's activity

How to add non-opensource model for pre-release evaluation?