Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

Ujjwal-TyagiΒ 
posted an update 1 day ago
view post
Post
1622
For more better details and analysis, you can read the article here: https://huggingface.co/blog/Ujjwal-Tyagi/steering-not-censoring, We are sleepwalking into a crisis. I am deeply concerned about AI model safety right now because, as the community rushes to roll out increasingly powerful open-source models, we are completely neglecting the most critical aspect: safety. It seems that nobody is seriously thinking about the potential consequences of unregulated model outputs or the necessity of robust guardrails. We are essentially planting the seeds for our own destruction if we prioritize raw performance over security.

This negligence is terrifyingly evident when you look at the current landscape. Take Qwen Image 2512, for example; while it delivers undeniably strong performance, it has incredibly weak guardrails that make it dangerous to deploy. In stark contrast, Z Image might not get as much hype for its power, but it has much better safety guardrails than Qwen Image 2512.

It is imperative that the open-source community and developers recognize that capability without responsibility is a liability. We must actively work on protecting these models from bad actors who seek to exploit them for malicious purposes, such as generating disinformation, creating non-consensual imagery, or automating cyberattacks. It is no longer enough to simply release a powerful model; we must build layers of defense that make it resistant to jailbreaking and adversarial attacks. Developers need to prioritize alignment and robust filtering techniques just as much as they prioritize benchmark scores. We cannot hand such potent tools to the world without ensuring they have the safety mechanisms to prevent them from being turned against us.
Β·
hypotheticalΒ 
posted an update 3 days ago
view post
Post
1826
We have updated our transcription model: TheStageAI/thewhisper-large-v3-turbo

– 6.00 WER on the English Open ASR Leaderboard
– 4.74 WER on the Multilingual Open ASR Leaderboard
– Beats NVIDIA Parakeet (6.34 WER) and Whisper-large-v3-turbo (7.8 WER)
– Strong improvements in Arabic, Hindi, Chinese
– Maintains quality with background and environmental noise
– Optimized inference engines for NVIDIA and Apple
– Hugging Face Transformers interface for easy use
– Best-in-class speed on NVIDIA GPUs and power efficiency on Apple devices
– NVIDIA Jetson Thor support
  • 2 replies
Β·
DawnCΒ 
posted an update 1 day ago
view post
Post
1262
VividFlow: AI Image Enhancement & Video Generation 🎬🎨

Bring your images to life with cinematic motion AND create stunning AI backgrounds! VividFlow combines professional-grade video generation with intelligent background replacement in one streamlined platform.

🎭 Dual Creative Powers
Transform any static image into high-quality dynamic videos with smooth, natural motion ranging from 0.5 to 5 seconds. Choose from curated motion templates across 8 categories designed for portraits, products, landscapes, and artistic content. Create photorealistic backgrounds by selecting from 24 professionally crafted scene presets spanning studios, natural environments, urban settings, and artistic atmospheres...etc.

⚑ Optimized Performance
Video generation currently completes in 4-5 minutes with active optimization underway to dramatically reduce processing time. Background replacement finishes in 30-40 seconds after initial loading. The independent dual-tab design ensures smooth workflow without performance conflicts.

🎯 Complete Creative Control
Achieve perfectly consistent results with seed-based reproducibility and adjustable duration for video generation. Background creation offers flexible composition modes, precision edge softening for challenging subjects, and instant mask preview for quality verification.

πŸ“ˆ Continuous Innovation
Ongoing optimization targets significantly faster video generation through advanced model preparation. Future enhancements include expanded template libraries, batch processing capabilities, and industry-specific presets shaped by community feedback.

πŸ‘‰ Try it now: DawnC/VividFlow

Support development with a ❀️ β€” your engagement shapes future priorities!
#AI #ImageToVideo #BackgroundGeneration #VideoGeneration
  • 2 replies
Β·
unmodeled-tylerΒ 
posted an update 3 days ago
view post
Post
1241
Atom-80B is out!: vanta-research/atom-80b

I'm excited to share the new Atom-80B from VANTA Research! A few days ago we released the largest model-to-date from our portfolio, which was Atom-27B.

We've quickly scaled up to the new Qwen3 Next 80B architecture, bringing our friendly, curious, and collaborative Atom persona to cutting edge lightweight, high parameter inference.

Atom is designed to work and think alongside you through curious exploration. Using Atom collaboratively in your work can help spark your own creativity or curiosity. Give it a try!
sergiopaniegoΒ 
posted an update 3 days ago
Reality123bΒ 
posted an update 3 days ago
view post
Post
1970
Happy birthday to me!!!
  • 2 replies
Β·
AdinaYΒ 
posted an update 3 days ago
view post
Post
1288
Wechat AI is shipping!

WeDLM πŸ”₯ A new language model that generates tokens in parallel, making it faster than standard LLMs , with the same Transformer setup!
https://huggingface.co/collections/tencent/wedlm

✨ 7B/8B - Base & Instruct
✨ Apache 2.0
  • 3 replies
Β·
de-RodrigoΒ 
posted an update 2 days ago
view post
Post
1533
We are happy to share the VERSE Methodology paper via arXiv! πŸ“ƒπŸ’«

VERSE: Visual Embedding Reduction and Space Exploration. Clustering-Guided Insights for Training Data Enhancement in Visually-Rich Document Understanding (2601.05125)

We usually train VLMs on visual synthetic data that we (as humans) label as photorealistic. We argue that this is an anthropocentric perspective imposed to a model that might not synthetize visual information as we do. VERSE helps to visualize latent space and overlay visual features to detect poor-performance regions and take action to include better-suited training sets to boost model performance.


Resources:

- Code: https://github.com/nachoDRT/VrDU-Doctor
- Hugging Face Space: de-Rodrigo/Embeddings

Want to collaborate? Do you have any feedback? 🧐

PD: As always, we are grateful to Hugging Face πŸ€— for providing the fantastic tools and resources we find on the platform!
SkorczyBykΒ 
posted an update 1 day ago
view post
Post
1574
PotrzebujΔ™ modelu do analizy dΕ‚ugich dokumentΓ³w technicznych, z wzorami, tabelami, wykresami, formuΕ‚ami matematycznymi.
  • 2 replies
Β·
etemizΒ 
posted an update 2 days ago
view post
Post
1573
how to expand your dataset (of articles) without changing the ideas in it?

i was doing CPT for a while and got decent results. but what if i want to go for perfection? cover all the areas of misalignment using limited datasets. i have to find a way to multiply the material to successfully combat the material of the rest of the internet.

i want to generate SFT datasets but only on controversial topics, because i have to be efficient with limited resources. first i give a smart LLM a 'ground truth' text. then i give it the following prompts:

- You are a highly skilled academic analyst.

- Analyze this text and find 3 bold claims that could cause controversy and division in public. List the claims and also state why they are debatable. Give numbers to the claims.

- Convert these claims into binary questions (that could be answered by yes/no or this/that).

- Now put these questions in a json format. Please also add the info about which of the answers concur with the original text and the question number.

- Write some supporting arguments for 1st question, with respect to the original text, concurring and confirming the original text. 
There must be about 300 words. You should not mention the text, write it as if you are the one answering the question.


the result is questions and answers with more words along the same ideas. a few sentences of opinions in the beginning, is expanded to lots of words. using this method i can multiply billions of tokens to tens of billions probably and have a more effective training.

next i should do RL maybe. LLMs seem to have all kinds of ideas already installed, yet they don't have the intuition to know which one is true. they can give you a ton of reasons to support anything. given the proper incentives, LLMs then should evolve towards supporting aligned ideas more. the rewards will be like guidance that will kick an LLM towards better answers.
  • 3 replies
Β·