Hugging Face Internal Testing Organization

company
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

dn6Β  updated a model about 4 hours ago
hf-internal-testing/tiny-flux-transformer
dn6Β  published a model about 4 hours ago
hf-internal-testing/tiny-flux-transformer
View all activity

hf-internal-testing's activity

albertvillanovaΒ 
posted an update about 15 hours ago
view post
Post
421
πŸš€ Introducing @huggingface Open Deep-ResearchπŸ’₯

In just 24 hours, we built an open-source agent that:
βœ… Autonomously browse the web
βœ… Search, scroll & extract info
βœ… Download & manipulate files
βœ… Run calculations on data

55% on GAIA validation set! Help us improve it!πŸ’‘
https://huggingface.co/blog/open-deep-research
  • 1 reply
Β·
sayakpaulΒ 
posted an update 6 days ago
view post
Post
1721
We have been cooking a couple of fine-tuning runs on CogVideoX with finetrainers, smol datasets, and LoRA to generate cool video effects like crushing, dissolving, etc.

We are also releasing a LoRA extraction utility from a fully fine-tuned checkpoint. I know that kind of stuff has existed since eternity, but the quality on video models was nothing short of spectacular. Below are some links:

* Models and datasets: https://huggingface.co/finetrainers
* finetrainers: https://github.com/a-r-r-o-w/finetrainers
* LoRA extraction: https://github.com/huggingface/diffusers/blob/main/scripts/extract_lora_from_model.py
  • 1 reply
Β·
sayakpaulΒ 
posted an update 9 days ago
view post
Post
1891
We have authored a post to go over the state of video generation in the Diffusers ecosystem 🧨

We cover the models supported, the knobs of optims our users can fire, fine-tuning, and more πŸ”₯

5-6GBs for HunyuanVideo, sky is the limit 🌌 πŸ€—
https://huggingface.co/blog/video_gen
lewtunΒ 
posted an update 11 days ago
view post
Post
9805
We are reproducing the full DeepSeek R1 data and training pipeline so everybody can use their recipe. Instead of doing it in secret we can do it together in the open!

πŸ§ͺ Step 1: replicate the R1-Distill models by distilling a high-quality reasoning corpus from DeepSeek-R1.

🧠 Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale datasets for math, reasoning, and code.

πŸ”₯ Step 3: show we can go from base model -> SFT -> RL via multi-stage training.

Follow along: https://github.com/huggingface/open-r1
Β·
tomaarsenΒ 
posted an update 13 days ago
view post
Post
1594
I just released Sentence Transformers v3.4.0, featuring a memory leak fix, compatibility between the powerful Cached... losses and the Matryoshka loss modifier, and a bunch of fixes & small features.

πŸͺ† Matryoshka & Cached loss compatibility
It is now possible to combine the powerful Cached... losses (which use in-batch negatives & a caching mechanism to allow for endless batch size & negatives) with the Matryoshka loss modifier which modifies a base loss such that it is trained not only on the maximum dimensionality (e.g. 1024 dimensions), but also on many lower dimensions (e.g. 768, 512, 256, 128, 64, 32).
After training, these models' embeddings can be truncated for faster retrieval, etc.

🎞️ Resolve memory leak when Model and Trainer are reinitialized
Due to a circular dependency between Trainer -> Model -> ModelCardData -> Trainer, deleting both the trainer & model still didn't free up the memory.
This led to a memory leak in scripts where you repeatedly do so.

βž• New Features
Many new small features, e.g. multi-GPU support for 'mine_hard_negatives', a 'margin' parameter to TripletEvaluator, and Matthews Correlation Coefficient in the BinaryClassificationEvaluator.

πŸ› Bug Fixes
Also a bunch of fixes, for example that subsequent batches were not sorted when using the "no_duplicates" batch sampler. See the release notes for more details.

Full release notes: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.4.0

Big thanks to all community members who assisted in this release. 10 folks with their first contribution this time around!
XenovaΒ 
posted an update 20 days ago
view post
Post
4524
Introducing Kokoro.js, a new JavaScript library for running Kokoro TTS, an 82 million parameter text-to-speech model, 100% locally in the browser w/ WASM. Powered by πŸ€— Transformers.js. WebGPU support coming soon!
πŸ‘‰ npm i kokoro-js πŸ‘ˆ

Try it out yourself: webml-community/kokoro-web
Link to models/samples: onnx-community/Kokoro-82M-ONNX

You can get started in just a few lines of code!
import { KokoroTTS } from "kokoro-js";

const tts = await KokoroTTS.from_pretrained(
  "onnx-community/Kokoro-82M-ONNX",
  { dtype: "q8" }, // fp32, fp16, q8, q4, q4f16
);

const text = "Life is like a box of chocolates. You never know what you're gonna get.";
const audio = await tts.generate(text,
  { voice: "af_sky" }, // See `tts.list_voices()`
);
audio.save("audio.wav");

Huge kudos to the Kokoro TTS community, especially taylorchu for the ONNX exports and Hexgrad for the amazing project! None of this would be possible without you all! πŸ€—

The model is also extremely resilient to quantization. The smallest variant is only 86 MB in size (down from the original 326 MB), with no noticeable difference in audio quality! 🀯
Β·
tomaarsenΒ 
posted an update 21 days ago
view post
Post
4539
🏎️ Today I'm introducing a method to train static embedding models that run 100x to 400x faster on CPU than common embedding models, while retaining 85%+ of the quality! Including 2 fully open models: training scripts, datasets, metrics.

We apply our recipe to train 2 Static Embedding models that we release today! We release:
2️⃣ an English Retrieval model and a general-purpose Multilingual similarity model (e.g. classification, clustering, etc.), both Apache 2.0
🧠 my modern training strategy: ideation -> dataset choice -> implementation -> evaluation
πŸ“œ my training scripts, using the Sentence Transformers library
πŸ“Š my Weights & Biases reports with losses & metrics
πŸ“• my list of 30 training and 13 evaluation datasets

The 2 Static Embedding models have the following properties:
🏎️ Extremely fast, e.g. 107500 sentences per second on a consumer CPU, compared to 270 for 'all-mpnet-base-v2' and 56 for 'gte-large-en-v1.5'
0️⃣ Zero active parameters: No Transformer blocks, no attention, not even a matrix multiplication. Super speed!
πŸ“ No maximum sequence length! Embed texts at any length (note: longer texts may embed worse)
πŸ“ Linear instead of exponential complexity: 2x longer text takes 2x longer, instead of 2.5x or more.
πŸͺ† Matryoshka support: allow you to truncate embeddings with minimal performance loss (e.g. 4x smaller with a 0.56% perf. decrease for English Similarity tasks)

Check out the full blogpost if you'd like to 1) use these lightning-fast models or 2) learn how to train them with consumer-level hardware: https://huggingface.co/blog/static-embeddings

The blogpost contains a lengthy list of possible advancements; I'm very confident that our 2 models are only the tip of the iceberg, and we may be able to get even better performance.

Alternatively, check out the models:
* sentence-transformers/static-retrieval-mrl-en-v1
* sentence-transformers/static-similarity-mrl-multilingual-v1
  • 1 reply
Β·
albertvillanovaΒ 
posted an update 29 days ago
lewtunΒ 
posted an update about 1 month ago
view post
Post
3798
I was initially pretty sceptical about Meta's Coconut paper [1] because the largest perf gains were reported on toy linguistic problems. However, these results on machine translation are pretty impressive!

https://x.com/casper_hansen_/status/1875872309996855343

Together with the recent PRIME method [2] for scaling RL, reasoning for open models is looking pretty exciting for 2025!

[1] Training Large Language Models to Reason in a Continuous Latent Space (2412.06769)
[2] https://huggingface.co/blog/ganqu/prime
XenovaΒ 
posted an update about 1 month ago
view post
Post
8184
First project of 2025: Vision Transformer Explorer

I built a web app to interactively explore the self-attention maps produced by ViTs. This explains what the model is focusing on when making predictions, and provides insights into its inner workings! 🀯

Try it out yourself! πŸ‘‡
webml-community/attention-visualization

Source code: https://github.com/huggingface/transformers.js-examples/tree/main/attention-visualization
tomaarsenΒ 
posted an update about 1 month ago
view post
Post
2977
That didn't take long! Nomic AI has finetuned the new ModernBERT-base encoder model into a strong embedding model for search, classification, clustering and more!

Details:
πŸ€– Based on ModernBERT-base with 149M parameters.
πŸ“Š Outperforms both nomic-embed-text-v1 and nomic-embed-text-v1.5 on MTEB!
🏎️ Immediate FA2 and unpacking support for super efficient inference.
πŸͺ† Trained with Matryoshka support, i.e. 2 valid output dimensionalities: 768 and 256.
➑️ Maximum sequence length of 8192 tokens!
2️⃣ Trained in 2 stages: unsupervised contrastive data -> high quality labeled datasets.
βž• Integrated in Sentence Transformers, Transformers, LangChain, LlamaIndex, Haystack, etc.
πŸ›οΈ Apache 2.0 licensed: fully commercially permissible

Try it out here: nomic-ai/modernbert-embed-base

Very nice work by Zach Nussbaum and colleagues at Nomic AI.
lewtunΒ 
posted an update about 1 month ago
view post
Post
2264
This paper ( HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs (2412.18925)) has a really interesting recipe for inducing o1-like behaviour in Llama models:

* Iteratively sample CoTs from the model, using a mix of different search strategies. This gives you something like Stream of Search via prompting.
* Verify correctness of each CoT using GPT-4o (needed because exact match doesn't work well in medicine where there are lots of aliases)
* Use GPT-4o to reformat the concatenated CoTs into a single stream that includes smooth transitions like "hmm, wait" etc that one sees in o1
* Use the resulting data for SFT & RL
* Use sparse rewards from GPT-4o to guide RL training. They find RL gives an average ~3 point boost across medical benchmarks and SFT on this data already gives a strong improvement.

Applying this strategy to other domains could be quite promising, provided the training data can be formulated with verifiable problems!
  • 1 reply
Β·
sayakpaulΒ 
posted an update about 1 month ago
view post
Post
4326
Commits speak louder than words πŸ€ͺ

* 4 new video models
* Multiple image models, including SANA & Flux Control
* New quantizers -> GGUF & TorchAO
* New training scripts

Enjoy this holiday-special Diffusers release πŸ€—
Notes: https://github.com/huggingface/diffusers/releases/tag/v0.32.0
anton-lΒ 
posted an update about 2 months ago
view post
Post
2366
Introducing πŸ“π…π’π§πžπŒπšπ­π‘: the best public math pre-training dataset with 50B+ tokens!
HuggingFaceTB/finemath

Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH.

We build the dataset by:
πŸ› οΈ carefully extracting math data from Common Crawl;
πŸ”Ž iteratively filtering and recalling high quality math pages using a classifier trained on synthetic annotations to identify math reasoning and deduction.

We conducted a series of ablations comparing the performance of Llama-3.2-3B-Base after continued pre-training on FineMath and observe notable gains compared to the baseline model and other public math datasets.

We hope this helps advance the performance of LLMs on math and reasoning! πŸš€
We’re also releasing all the ablation models as well as the evaluation code.

HuggingFaceTB/finemath-6763fb8f71b6439b653482c2
XenovaΒ 
posted an update about 2 months ago
view post
Post
4067
Introducing Moonshine Web: real-time speech recognition running 100% locally in your browser!
πŸš€ Faster and more accurate than Whisper
πŸ”’ Privacy-focused (no data leaves your device)
⚑️ WebGPU accelerated (w/ WASM fallback)
πŸ”₯ Powered by ONNX Runtime Web and Transformers.js

Demo: webml-community/moonshine-web
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/moonshine-web
Β·