Used YoloV8n Onnx + FastVLM to provide real time object detection and annotation. Works well with videos that do not have a lot of changes. In the space I used a stock security camera footage. It annotates using FastVLM while doing Object Detection using YoloV8n.
๐ข Generate your own data in simulation using two new free and customizable data-generating Scenarios on Duality's FalconCloud service. ๐ These multi-class Scenarios are designed to target model weaknesses for our recent Kaggle competition, but they are free to anyone for non-commercial use!
๐ธ Control object and camera posing ๐ Select random variable ranges ๐ผ๏ธ Set post-processing effects โ and more to create a robust dataset for strong model training.
I couldn't watch innocent people get their rights trampled anymore. So I built something to help.
Stories of families torn apart, U.S. citizens detained for hours, people arrested just for speaking Spanish. This isn't the America I believe in.
Instead of doom-scrolling, I spent a few days building FIREWATCH - a free civil rights protection app.
What it does: โข Real-time ICE raid alerts โข Know Your Rights education in 10+ languages โข Secure evidence recording โข Emergency panic button โข Legal hotlines and resources โข 100% private, no tracking
The catch? There isn't one. You just need a free Google API key that stays on your device. Works completely offline.
I was messing around with the HF api trying to get some stats on all time downloads for my models, and then I made it into a space so that anyone can use it.
๐ For those who interested in multilingual clinical case report sukmmarization ๐ฉบ๐, deligned to share a video-update to the earlier post on Qwen2.5 model family adaptation:
This is 15-min skimming of the study (+ 5 mins for code) in which we overview the application of Qwen model family (72B as a teacher and 0.5B as a student) in summarization of the clinical reports, including detaied overview of the experiments organization. In particular, attempted to cover: 1. Background of previous Seq2Seq models to conclude their limitations 2. ChatML roles exploiting for distilation tuning in clinical report summarization 3. Known limitation of work and unleashing full capabilities
๐ New in Azure Model Catalog: NVIDIA Parakeet TDT 0.6B V2
We're excited to welcome Parakeet TDT 0.6B V2โa state-of-the-art English speech-to-text modelโto the Azure Foundry Model Catalog.
What is it?
A powerful ASR model built on the FastConformer-TDT architecture, offering: ๐ Word-level timestamps โ๏ธ Automatic punctuation & capitalization ๐ Strong performance across noisy and real-world audio
It runs with NeMo, NVIDIAโs optimized inference engine.
Want to give it a try? ๐ง You can test it with your own audio (up to 3 hours) on Hugging Face Spaces before deploying.If it fits your need, deploy easily from the Hugging Face Hub or Azure ML Studio with secure, scalable infrastructure!
๐ Learn more by following this guide written by @alvarobartt