BuiDoan (BuiDoan)

liked 3 models 19 days ago

upvoted an article about 1 month ago

Article

The 4 Things Qwen-3's Chat Template Teaches Us

By

•

Apr 30

• 54

liked a model about 1 month ago

facebook/OMol25

Updated 5 days ago • 116

reacted to seawolf2357's post with 👀 about 1 month ago

Post

6212

Samsung Hacking Incident: Samsung Electronics' Official Hugging Face Account Compromised
Samsung Electronics' official Hugging Face account has been hacked. Approximately 17 hours ago, two new language models (LLMs) were registered under Samsung Electronics' official Hugging Face account. These models are:

https://huggingface.co/Samsung/MuTokenZero2-32B
https://huggingface.co/Samsung/MythoMax-L2-13B

The model descriptions contain absurd and false claims, such as being trained on "1 million W200 GPUs," hardware that doesn't even exist.
Moreover, community participants on Hugging Face who have noticed this issue are continuously posting that Samsung Electronics' account has been compromised.
There is concern about potential secondary and tertiary damage if users download these LLMs released under the Samsung Electronics account, trusting Samsung's reputation without knowing about the hack.
Samsung Electronics appears to be unaware of this situation, as they have not taken any visible measures yet, such as changing the account password.
Source: https://discord.gg/openfreeai

2 replies

·

upvoted 2 papers about 1 month ago

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

Paper • 2505.07608 • Published May 12 • 79

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Paper • 2505.04921 • Published May 8 • 176

updated a collection about 1 month ago

Great paper

Collection

28 items • Updated May 12

upvoted a paper about 1 month ago

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6 • 170

liked a model about 1 month ago

ByteDance-Seed/Seed-Coder-8B-Instruct

Text Generation • Updated 10 days ago • 11k • 94

reacted to Kseniase's post with 👍 about 1 month ago

Post

4991

11 Alignment and Optimization Algorithms for LLMs

When we need to align models' behavior with the desired objectives, we rely on specialized algorithms that support helpfulness, accuracy, reasoning, safety, and alignment with user preferences. Much of a model’s usefulness comes from post-training optimization methods.

Here are the main optimization algorithms (both classic and new) in one place:

1. PPO (Proximal Policy Optimization) -> Proximal Policy Optimization Algorithms (1707.06347)
Clips the probability ratio to prevent the new policy from diverging too far from the old one. It helps keep everything stable

2. DPO (Direct Preference Optimization) -> Direct Preference Optimization: Your Language Model is Secretly a Reward Model (2305.18290)
It's a non RL method, where an LM is an implicit reward model. It uses a simple loss to boost the preferred answer’s probability over the less preferred one

3. GRPO (Group Relative Policy Optimization) -> DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (2402.03300)
An RL method that compares a group of model outputs for the same input and updates the policy based on relative rankings. It doesn't need a separate critic model
It's latest application is Flow-GRPO which adds online RL into flow matching models -> Flow-GRPO: Training Flow Matching Models via Online RL (2505.05470)

4. DAPO (Decoupled Clip and Dynamic sAmpling Policy Optimization) -> DAPO: An Open-Source LLM Reinforcement Learning System at Scale (2503.14476)
Decouples the clipping bounds for flexibility, introducing 4 key techniques: clip-higher (to maintain exploration), dynamic sampling (to ensure gradient updates), token-level loss (to balance learning across long outputs), and overlong reward shaping (to handle long, truncated answers)

5. Supervised Fine-Tuning (SFT) -> Training language models to follow instructions with human feedback (2203.02155)
Often the first post-pretraining step. A model is fine-tuned on a dataset of high-quality human-written input-output pairs to directly teach desired behaviors

More in the comments 👇

If you liked it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

1 reply

·

upvoted an article about 1 month ago

Article

Yes, Transformers are Effective for Time Series Forecasting (+ Autoformer)

By

and 2 others •

Jun 16, 2023

• 32

reacted to wolfram's post with 🚀 about 1 month ago

Post

7221

Finally finished my extensive **Qwen 3 evaluations** across a range of formats and quantisations, focusing on **MMLU-Pro** (Computer Science).

A few take-aways stood out - especially for those interested in local deployment and performance trade-offs:

1️⃣ **Qwen3-235B-A22B** (via Fireworks API) tops the table at **83.66%** with ~55 tok/s.
2️⃣ But the **30B-A3B Unsloth** quant delivered **82.20%** while running locally at ~45 tok/s and with zero API spend.
3️⃣ The same Unsloth build is ~5x faster than Qwen's **Qwen3-32B**, which scores **82.20%** as well yet crawls at <10 tok/s.
4️⃣ On Apple silicon, the **30B MLX** port hits **79.51%** while sustaining ~64 tok/s - arguably today's best speed/quality trade-off for Mac setups.
5️⃣ The **0.6B** micro-model races above 180 tok/s but tops out at **37.56%** - that's why it's not even on the graph (50 % performance cut-off).

All local runs were done with LM Studio on an M4 MacBook Pro, using Qwen's official recommended settings.

**Conclusion:** Quantised 30B models now get you ~98 % of frontier-class accuracy - at a fraction of the latency, cost, and energy. For most local RAG or agent workloads, they're not just good enough - they're the new default.

Well done, Qwen - you really whipped the llama's ass! And to OpenAI: for your upcoming open model, please make it MoE, with toggleable reasoning, and release it in many sizes. *This* is the future!

4 replies

·

liked a model about 1 month ago

nvidia/parakeet-tdt-0.6b-v2

Automatic Speech Recognition • Updated 24 days ago • 875k • 1.14k

upvoted 2 papers about 1 month ago

100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models

Paper • 2505.00551 • Published May 1 • 37

ReasonIR: Training Retrievers for Reasoning Tasks

Paper • 2504.20595 • Published Apr 29 • 55

upvoted an article about 1 month ago

Article

What is MoE 2.0? Update Your Knowledge about Mixture-of-experts

By

and 1 other •

Apr 27

• 9

liked a model about 1 month ago

ginigen/Private-BitSix-Mistral-Small-3.1-24B-Instruct-2503

Updated Mar 31 • 666 • 12

reacted to ginipick's post with ❤️ about 1 month ago

Post

5210

🔮 Mistral Perflexity AI - Local LLM Space with Web Search Capabilities 🌐
Hello AI enthusiasts! Today I'm excited to introduce my special Hugging Face space! 🚀

ginigen/Mistral-Perflexity

✨ Key Features

Powerful Model: Using Private-BitSix-Mistral-Small-3.1-24B-Instruct-2503, optimized through 6-bit quantization to run smoothly on local 4090 GPUs! 💪
Web Search Integration: Leveraging the Brave Search API to provide real-time web search results for user queries! 🔍
Customizable Responses: Shape AI personality and response format through system messages ⚙️
Multilingual Support: Perfect handling of both English and Korean! 🇺🇸🇰🇷

🛠️ Technical Highlights

GGUF Format: Optimized quantized model with excellent memory efficiency
Flash Attention: Applied optimization technology for faster inference speeds
8K Context Window: Capable of handling lengthy conversations and complex queries
Streaming Responses: Watch text being generated in real-time

💡 Use Cases

Complex Q&A requiring real-time information
Programming assistance and code generation
Multilingual content creation and translation
Summarization and explanation of learning materials

🔧 Customization
Adjust various parameters like Temperature, Top-p, Top-k, and repetition penalty to control response creativity and accuracy. Lower temperature (0.1-0.5) produces more deterministic responses, while higher values (0.7-1.0) generate more creative outputs!

🌟 Try It Yourself!
This space is available for anyone to use for free. Experience the power of a robust local LLM combined with web search capabilities! Your feedback is always welcome! 😊

BuiDoan

AI & ML interests

Recent Activity

Organizations

BuiDoan's activity

google/gemma-3n-E4B-it-litert-preview

nari-labs/Dia-1.6B

sarvamai/sarvam-m

The 4 Things Qwen-3's Chat Template Teaches Us

facebook/OMol25

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Great paper

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

ByteDance-Seed/Seed-Coder-8B-Instruct

Yes, Transformers are Effective for Time Series Forecasting (+ Autoformer)

nvidia/parakeet-tdt-0.6b-v2

100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models

ReasonIR: Training Retrievers for Reasoning Tasks

What is MoE 2.0? Update Your Knowledge about Mixture-of-experts

ginigen/Private-BitSix-Mistral-Small-3.1-24B-Instruct-2503