PawMatchAI offers a comprehensive suite of features designed for dog enthusiasts and prospective owners alike. This all-in-one platform delivers five essential tools to enhance your canine experience:
1. ๐Breed Detection: Upload any dog photo and the AI accurately identifies breeds from an extensive database of 124+ different dog breeds. The system detects dogs in the image and provides confident breed identification results.
2.๐Breed Information: Access detailed profiles for each breed covering exercise requirements, typical lifespan, grooming needs, health considerations, and noise behavior - giving you complete understanding of any breed's characteristics.
3.๐ Breed Comparison : Compare any two breeds side-by-side with intuitive visualizations highlighting differences in care requirements, personality traits, health factors, and more - perfect for making informed decisions.
4.๐ก Breed Recommendation: Receive personalized breed suggestions based on your lifestyle preferences. The sophisticated matching system evaluates compatibility across multiple factors including living space, exercise capacity, experience level, and family situation.
5.๐จ Style Transfer: Transform your dog photos into artistic masterpieces with five distinct styles: Japanese Anime, Classic Cartoon, Oil Painting, Watercolor, and Cyberpunk - adding a creative dimension to your pet photography.
๐ฌ Qwen made it rain! They released Qwen3: new dense and MoE models ranging from 0.6B to 235B ๐คฏ as well as Qwen2.5-Omni, any-to-any model in 3B and 7B! > Microsoft AI released Phi4 reasoning models (that also come in mini and plus sizes) > NVIDIA released new CoT reasoning datasets ๐ผ๏ธ > ByteDance released UI-TARS-1.5, native multimodal UI parsing agentic model > Meta released EdgeTAM, an on-device object tracking model (SAM2 variant) ๐ฃ๏ธ NVIDIA released parakeet-tdt-0.6b-v2, a smol 600M automatic speech recognition model > Nari released Dia, a 1.6B text-to-speech model > Moonshot AI released Kimi Audio, a new audio understanding, generation, conversation model ๐ฉ๐ปโ๐ป JetBrains released Melium models in base and SFT for coding > Tesslate released UIGEN-T2-7B, a new text-to-frontend-code model ๐คฉ
What are you using to evaluate models or AI systems? So far we're building lighteval & leaderboards on the hub but still feels early & a lot more to build. What would be useful to you?
RL now is where the real action is, it's the engine behind autonomous tech, robots, and the next wave of AI that thinks, moves and solves problems on its own. To stay up to date with whatโs happening in RL, we offer some fresh materials on it:
1. "Reinforcement Learning from Human Feedback" by Nathan Lambert -> https://rlhfbook.com/ It's a short introduction to RLHF, explaining instruction tuning, reward modeling, alignment methods, synthetic data, evaluation, and more
2. "A Course in Reinforcement Learning (2nd Edition)" by Dimitri P. Bertsekas -> https://www.mit.edu/~dimitrib/RLbook.html Explains dynamic programming (DP) and RL, diving into rollout algorithms, neural networks, policy learning, etc. Itโs packed with solved exercises and real-world examples
4. "Multi-Agent Reinforcement Learning" by Stefano V. Albrecht, Filippos Christianos, and Lukas Schรคfer -> https://www.marl-book.com/ Covers models, core ideas of multi-agent RL (MARL) and modern approaches to combining it with deep learning
5. "Reinforcement Learning: A Comprehensive Overview" by Kevin P. Murphy -> https://arxiv.org/pdf/2412.05265 Explains RL and sequential decision making, covering value-based, policy-gradient, model-based, multi-agent RL methods, RL+LLMs, and RL+inference and other topics
I'm excited to introduce VisionScout โan interactive vision tool that makes computer vision both accessible and powerful! ๐๐
What can VisionScout do right now? ๐ผ๏ธ Upload any image and detect 80 different object types using YOLOv8. ๐ Instantly switch between Nano, Medium, and XLarge models depending on your speed vs. accuracy needs. ๐ฏ Filter specific classes (people, vehicles, animals, etc.) to focus only on what matters to you. ๐ View detailed statistics about detected objects, confidence levels, and spatial distribution. ๐จ Enjoy a clean, intuitive interface with responsive design and enhanced visualizations.
What's next? I'm working on exciting updates: - Support for more models - Video processing and object tracking across frames - Faster real-time detection - Improved mobile responsiveness
The goal is to build a complete but user-friendly vision toolkit for both beginners and advanced users.
๐ Delighted to share a major milestone in adapting reasoning techniques for data collections augmentation! Introducing bulk-chain 1.0.0 -- the first major release of a no-string API for adapting your LLM for Chain-of-Thought alike reasoning over records with large amount of parameters across large datasets.
๐ฆฅ Introducing Unsloth Dynamic v2.0 GGUFs! Our v2.0 quants set new benchmarks on 5-shot MMLU and KL Divergence, meaning you can now run & fine-tune quantized LLMs while preserving as much accuracy as possible.
We made selective layer quantization much smarter. Instead of modifying only a subset of layers, we now dynamically quantize all layers so every layer has a different bit. Now, our dynamic method can be applied to all LLM architectures, not just MoE's.
All our future GGUF uploads will leverage Dynamic 2.0 and our hand curated 300Kโ1.5M token calibration dataset to improve conversational chat performance.
For accurate benchmarking, we built an evaluation framework to match the reported 5-shot MMLU scores of Llama 4 and Gemma 3. This allowed apples-to-apples comparisons between full-precision vs. Dynamic v2.0, QAT and standard iMatrix quants.
Dynamic v2.0 aims to minimize the performance gap between full-precision models and their quantized counterparts.
Meta dropped swiss army knives for vision with A2.0 license ๐ > image/video encoders for vision language modelling and spatial understanding (object detection etc) ๐ > The vision LM outperforms InternVL3 and Qwen2.5VL ๐ > They also release gigantic video and image datasets
The authors attempt to come up with single versatile vision encoder to align on diverse set of tasks.
They trained Perception Encoder (PE) Core: a new state-of-the-art family of vision encoders that can be aligned for both vision-language and spatial tasks. For zero-shot image tasks, it outperforms latest sota SigLIP2 ๐
> Among fine-tuned ones, first one is PE-Spatial. It's a model to detect bounding boxes, segmentation, depth estimation and it outperforms all other models ๐ฎ
> Second one is PLM, Perception Language Model, where they combine PE-Core with Qwen2.5 LM 7B. it outperforms all other models (including InternVL3 which was trained with Qwen2.5LM too!)
The authors release the following checkpoints in sizes base, large and giant:
Authors release following datasets ๐ > PE Video: Gigantic video datasete of 1M videos with 120k expert annotations โฏ๏ธ > PLM-Video and PLM-Image: Human and auto-annotated image and video datasets on region-based tasks > PLM-VideoBench: New video benchmark on MCQA
After months of experimentation, I'm excited to share Aurea - a novel adaptive Spatial-Range attention mechanism that approaches multimodal fusion from a fundamentally different angle.
Most vision-language models use a single vision encoder followed by simple projection layers, creating a bottleneck that forces rich visual information through a single representational "funnel" before language integration.
What if we could integrate multiple visual perspectives throughout the modeling process?
The key innovation in Aurea isn't just using multiple encoders (DINOv2 + SigLIP2) - it's how we fuse them. The spatial-range attention mechanism preserves both spatial relationships and semantic information.
This dual awareness allows for richer representations which can be used for any downstream tasks. For instance, Aurea can better understand relational positions between objects, fine-grained details, and complex spatial hierarchies.
I've integrated Aurea into a language model (Phi-4 Mini) via basic pre-training and instruction-tuning. Everything is available - code, weights, and documentation. The CUDA implementation is particularly interesting if you enjoy high-performance computing.
I'd love to see what the community builds with this foundation and would appreciate your feedback. Whether you're interested in theoretical aspects of multimodal fusion or practical applications, there's something in Aurea for you.