* 4 new video models * Multiple image models, including SANA & Flux Control * New quantizers -> GGUF & TorchAO * New training scripts Enjoy this holiday-special Diffusers release 🤗 Notes: https://github.com/huggingface/diffusers/releases/tag/v0.32.0
📢 So far I noticed that 🧠 reasoning with llm 🤖 in English is tend to be more accurate than in other languages. However, besides the GoogleTrans and other open transparent translators, I could not find one that could be easy to use solutions to avoid: 1.🔴 Third-party framework installation 2.🔴 Text chunking 3.🔴 support of meta-annotation like spans / objects / etc.
💎 To cope problem of IR from non-english texts, I am happy to share the bulk-translate 0.25.0. 🎊
bulk-translate is a tiny Python 🐍 no-string framework that allows translate series of texts with the pre-annotated fixed-spans that are invariant for translator.
It supports 👨💻 API for quick data translation with (optionaly) annotated objects in texts (see figure below) in Python 🐍 I make it accessible as much as possible for RAG and / or LLM-powered app downstreams: 📘 https://github.com/nicolay-r/bulk-translate/wiki
All you have to do is to provide iterator of texts, where each text: 1. ✅ String object 2. ✅ List of strings and nested lists that represent spans (value + any ID data).
Implements from first-principle a discrete flow matching model for code generation- trained a small sized 2D dfm model on two variations of code for binary search. The result was amazing, code in comment: Code: https://github.com/Jaykef/ai-algorithms/blob/main/dfm.ipynb
🦾 Experience faster, lighter, and smarter language models! The new FastLlama makes Meta's LLaMA models work with smaller file sizes, lower system requirements, and higher performance. The model supports 8 languages, including English, German, and Spanish.
🤖 Built on the LLaMA 3.2-1B-Instruct model, fine-tuned with Hugging Face's SmolTalk and MetaMathQA-50k datasets, and powered by LoRA (Low-Rank Adaptation) for groundbreaking mathematical reasoning.
After 6 years, BERT, the workhorse of encoder models, finally gets a replacement: 𝗪𝗲𝗹𝗰𝗼𝗺𝗲 𝗠𝗼𝗱𝗲𝗿𝗻𝗕𝗘𝗥𝗧! 🤗
We talk a lot about ✨Generative AI✨, meaning "Decoder version of the Transformers architecture", but this is only one of the ways to build LLMs: encoder models, that turn a sentence in a vector, are maybe even more widely used in industry than generative models.
The workhorse for this category has been BERT since its release in 2018 (that's prehistory for LLMs).
It's not a fancy 100B parameters supermodel (just a few hundred millions), but it's an excellent workhorse, kind of a Honda Civic for LLMs.
Many applications use BERT-family models - the top models in this category cumulate millions of downloads on the Hub.
➡️ Now a collaboration between Answer.AI and LightOn just introduced BERT's replacement: ModernBERT.
𝗧𝗟;𝗗𝗥: 🏛️ Architecture changes: ⇒ First, standard modernizations: - Rotary positional embeddings (RoPE) - Replace GeLU with GeGLU, - Use Flash Attention 2 ✨ The team also introduced innovative techniques like alternating attention instead of full attention, and sequence packing to get rid of padding overhead.
🥇 As a result, the model tops the game of encoder models: It beats previous standard DeBERTaV3 for 1/5th the memory footprint, and runs 4x faster!
Introducing 📐𝐅𝐢𝐧𝐞𝐌𝐚𝐭𝐡: the best public math pre-training dataset with 50B+ tokens! HuggingFaceTB/finemath
Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH.
We build the dataset by: 🛠️ carefully extracting math data from Common Crawl; 🔎 iteratively filtering and recalling high quality math pages using a classifier trained on synthetic annotations to identify math reasoning and deduction.
We conducted a series of ablations comparing the performance of Llama-3.2-3B-Base after continued pre-training on FineMath and observe notable gains compared to the baseline model and other public math datasets.
We hope this helps advance the performance of LLMs on math and reasoning! 🚀 We’re also releasing all the ablation models as well as the evaluation code.
🎯The space handles documenting content from the input image along with standardized plain text. It includes adjustment tools with over 30 font styles, file formatting support for PDF and DOCX, textual alignments, font size adjustments, and line spacing modifications.
📄PDFs are rendered using the ReportLab software library toolkit.
🕰️ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years.
👴🏻 If they had needed all this time, we would have GPU stories from the time of Pharaoh 𓂀: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons "
🛠️ But instead, they just parallelized the training on 24k H100s, which made it take just a few months. This required parallelizing across 4 dimensions: data, tensor, context, pipeline. And it is infamously hard to do, making for bloated code repos that hold together only by magic.
🤏 𝗕𝘂𝘁 𝗻𝗼𝘄 𝘄𝗲 𝗱𝗼𝗻'𝘁 𝗻𝗲𝗲𝗱 𝗵𝘂𝗴𝗲 𝗿𝗲𝗽𝗼𝘀 𝗮𝗻𝘆𝗺𝗼𝗿𝗲! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry. And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening!
⚡ 𝗜𝘁'𝘀 𝘁𝗶𝗻𝘆, 𝘆𝗲𝘁 𝗽𝗼𝘄𝗲𝗿𝗳𝘂𝗹: Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this)
In the past seven days, the Diffusers team has shipped:
1. Two new video models 2. One new image model 3. Two new quantization backends 4. Three new fine-tuning scripts 5. Multiple fixes and library QoL improvements
Coffee on me if someone can guess 1 - 4 correctly.
The paper has a lot of experiments (they trained 84 models!) about what makes the video LMs work ⏯️
Try the demo for best setup here https://huggingface.co/spaces/Apollo-LMMs/Apollo-3B they evaluate sampling strategies, scaling laws for models and datasets, video representation and more! > The authors find out that whatever design decision was applied to small models also scale properly when the model and dataset are scaled 📈 scaling dataset has diminishing returns for smaller models > They evaluate frame sampling strategies, and find that FPS sampling is better than uniform sampling, and they find 8-32 tokens per frame optimal > They also compare image encoders, they try a variation of models from shape optimized SigLIP to DINOv2 they find google/siglip-so400m-patch14-384 to be most powerful 🔥 > they also compare freezing different parts of models, training all stages with some frozen parts give the best yield
They eventually release three models, where Apollo-3B outperforms most 7B models and Apollo 7B outperforms 30B models 🔥
Introducing the Synthetic Data Generator, a user-friendly application that takes a no-code approach to creating custom datasets with Large Language Models (LLMs). The best part: A simple step-by-step process, making dataset creation a non-technical breeze, allowing anyone to create datasets and models in minutes and without any code.
We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute 🔥
How? By combining step-wise reward models with tree search algorithms :)
We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"
We're open sourcing the full recipe and sharing a detailed blog post.
In our blog post we cover:
📈 Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.
🎄 Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.
🧭 Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM
Here's the space for our new article that leverages LLMs with reinforcement learning to design high-quality small molecules. Check it out at alimotahharynia/GPT-2-Drug-Generator. You can also access the article here: https://arxiv.org/abs/2411.14157. I would be happy to receive your feedback.