Dcas89 PRO

Dcas89
ยท

AI & ML interests

None yet

Recent Activity

liked a model about 14 hours ago
microsoft/Phi-4-mini-reasoning
reacted to DawnC's post with ๐Ÿ”ฅ 3 days ago
PawMatchAI ๐Ÿพ: The Complete Dog Breed Platform PawMatchAI offers a comprehensive suite of features designed for dog enthusiasts and prospective owners alike. This all-in-one platform delivers five essential tools to enhance your canine experience: 1. ๐Ÿ”Breed Detection: Upload any dog photo and the AI accurately identifies breeds from an extensive database of 124+ different dog breeds. The system detects dogs in the image and provides confident breed identification results. 2.๐Ÿ“ŠBreed Information: Access detailed profiles for each breed covering exercise requirements, typical lifespan, grooming needs, health considerations, and noise behavior - giving you complete understanding of any breed's characteristics. 3.๐Ÿ“‹ Breed Comparison : Compare any two breeds side-by-side with intuitive visualizations highlighting differences in care requirements, personality traits, health factors, and more - perfect for making informed decisions. 4.๐Ÿ’ก Breed Recommendation: Receive personalized breed suggestions based on your lifestyle preferences. The sophisticated matching system evaluates compatibility across multiple factors including living space, exercise capacity, experience level, and family situation. 5.๐ŸŽจ Style Transfer: Transform your dog photos into artistic masterpieces with five distinct styles: Japanese Anime, Classic Cartoon, Oil Painting, Watercolor, and Cyberpunk - adding a creative dimension to your pet photography. ๐Ÿ‘‹Explore PawMatchAI today: https://huggingface.co/spaces/DawnC/PawMatchAI If you enjoy this project or find it valuable for your canine companions, I'd greatly appreciate your support with a Likeโค๏ธ for this project. #ArtificialIntelligence #MachineLearning #ComputerVision #PetTech #TechForLife
View all activity

Organizations

None yet

Dcas89's activity

reacted to DawnC's post with ๐Ÿ”ฅ 3 days ago
view post
Post
3385
PawMatchAI ๐Ÿพ: The Complete Dog Breed Platform

PawMatchAI offers a comprehensive suite of features designed for dog enthusiasts and prospective owners alike. This all-in-one platform delivers five essential tools to enhance your canine experience:

1. ๐Ÿ”Breed Detection: Upload any dog photo and the AI accurately identifies breeds from an extensive database of 124+ different dog breeds. The system detects dogs in the image and provides confident breed identification results.

2.๐Ÿ“ŠBreed Information: Access detailed profiles for each breed covering exercise requirements, typical lifespan, grooming needs, health considerations, and noise behavior - giving you complete understanding of any breed's characteristics.

3.๐Ÿ“‹ Breed Comparison : Compare any two breeds side-by-side with intuitive visualizations highlighting differences in care requirements, personality traits, health factors, and more - perfect for making informed decisions.

4.๐Ÿ’ก Breed Recommendation: Receive personalized breed suggestions based on your lifestyle preferences. The sophisticated matching system evaluates compatibility across multiple factors including living space, exercise capacity, experience level, and family situation.

5.๐ŸŽจ Style Transfer: Transform your dog photos into artistic masterpieces with five distinct styles: Japanese Anime, Classic Cartoon, Oil Painting, Watercolor, and Cyberpunk - adding a creative dimension to your pet photography.

๐Ÿ‘‹Explore PawMatchAI today:
DawnC/PawMatchAI

If you enjoy this project or find it valuable for your canine companions, I'd greatly appreciate your support with a Likeโค๏ธ for this project.

#ArtificialIntelligence #MachineLearning #ComputerVision #PetTech #TechForLife
reacted to merve's post with ๐Ÿ”ฅ 6 days ago
view post
Post
4941
A ton of impactful models and datasets in open AI past week, let's summarize the best ๐Ÿคฉ merve/releases-apr-21-and-may-2-6819dcc84da4190620f448a3

๐Ÿ’ฌ Qwen made it rain! They released Qwen3: new dense and MoE models ranging from 0.6B to 235B ๐Ÿคฏ as well as Qwen2.5-Omni, any-to-any model in 3B and 7B!
> Microsoft AI released Phi4 reasoning models (that also come in mini and plus sizes)
> NVIDIA released new CoT reasoning datasets
๐Ÿ–ผ๏ธ > ByteDance released UI-TARS-1.5, native multimodal UI parsing agentic model
> Meta released EdgeTAM, an on-device object tracking model (SAM2 variant)
๐Ÿ—ฃ๏ธ NVIDIA released parakeet-tdt-0.6b-v2, a smol 600M automatic speech recognition model
> Nari released Dia, a 1.6B text-to-speech model
> Moonshot AI released Kimi Audio, a new audio understanding, generation, conversation model
๐Ÿ‘ฉ๐Ÿปโ€๐Ÿ’ป JetBrains released Melium models in base and SFT for coding
> Tesslate released UIGEN-T2-7B, a new text-to-frontend-code model ๐Ÿคฉ
reacted to clem's post with โค๏ธ 6 days ago
view post
Post
3945
What are you using to evaluate models or AI systems? So far we're building lighteval & leaderboards on the hub but still feels early & a lot more to build. What would be useful to you?
ยท
reacted to eaddario's post with ๐Ÿ‘ 15 days ago
view post
Post
2243
Until recently, watt-ai/watt-tool-70B was the best performing model in the Berkeley Function-Calling Leaderboard (https://gorilla.cs.berkeley.edu/leaderboard.html), which evaluates LLM's ability to call functions (tools) accurately. The top spot now belongs to Salesforce/Llama-xLAM-2-70b-fc-r and by a quite wide margin!

Layer-wise quantized versions for both models are available at eaddario/Llama-xLAM-2-8b-fc-r-GGUF and eaddario/Watt-Tool-8B-GGUF
reacted to Kseniase's post with ๐Ÿ‘ 15 days ago
view post
Post
6433
6 Free resources on Reinforcement Learning (RL)

RL now is where the real action is, it's the engine behind autonomous tech, robots, and the next wave of AI that thinks, moves and solves problems on its own. To stay up to date with whatโ€™s happening in RL, we offer some fresh materials on it:

1. "Reinforcement Learning from Human Feedback" by Nathan Lambert -> https://rlhfbook.com/
It's a short introduction to RLHF, explaining instruction tuning, reward modeling, alignment methods, synthetic data, evaluation, and more

2. "A Course in Reinforcement Learning (2nd Edition)" by Dimitri P. Bertsekas -> https://www.mit.edu/~dimitrib/RLbook.html
Explains dynamic programming (DP) and RL, diving into rollout algorithms, neural networks, policy learning, etc. Itโ€™s packed with solved exercises and real-world examples

3. "Mathematical Foundations of Reinforcement Learning" video course by Shiyu Zhao -> https://www.youtube.com/playlist?list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8
Offers a mathematical yet friendly introduction to RL, covering Bellman Equation, value iteration, Monte Carlo learning, approximation, policy gradient, actor-critic methods, etc.
+ Check out the repo for more: https://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning

4. "Multi-Agent Reinforcement Learning" by Stefano V. Albrecht, Filippos Christianos, and Lukas Schรคfer -> https://www.marl-book.com/
Covers models, core ideas of multi-agent RL (MARL) and modern approaches to combining it with deep learning

5. "Reinforcement Learning: A Comprehensive Overview" by Kevin P. Murphy -> https://arxiv.org/pdf/2412.05265
Explains RL and sequential decision making, covering value-based, policy-gradient, model-based, multi-agent RL methods, RL+LLMs, and RL+inference and other topics

6. Our collection of free courses and books on RL -> https://huggingface.co/posts/Kseniase/884818121094439

If you liked this, also subscribe to The Turing Post: https://www.turingpost.com/subscribe
reacted to DawnC's post with ๐Ÿ”ฅ 16 days ago
view post
Post
4239
I'm excited to introduce VisionScout โ€”an interactive vision tool that makes computer vision both accessible and powerful! ๐Ÿ‘€๐Ÿ”

What can VisionScout do right now?
๐Ÿ–ผ๏ธ Upload any image and detect 80 different object types using YOLOv8.
๐Ÿ”„ Instantly switch between Nano, Medium, and XLarge models depending on your speed vs. accuracy needs.
๐ŸŽฏ Filter specific classes (people, vehicles, animals, etc.) to focus only on what matters to you.
๐Ÿ“Š View detailed statistics about detected objects, confidence levels, and spatial distribution.
๐ŸŽจ Enjoy a clean, intuitive interface with responsive design and enhanced visualizations.

What's next?
I'm working on exciting updates:
- Support for more models
- Video processing and object tracking across frames
- Faster real-time detection
- Improved mobile responsiveness

The goal is to build a complete but user-friendly vision toolkit for both beginners and advanced users.

Try it yourself! ๐Ÿš€
DawnC/VisionScout

I'd love to hear your feedback , what features would you find most useful? Any specific use cases you'd love to see supported?

Give it a try and let me know your thoughts in the comments! Stay tuned for future updates.

#ComputerVision #ObjectDetection #YOLO #MachineLearning #TechForLife
reacted to nicolay-r's post with ๐Ÿ”ฅ 16 days ago
view post
Post
2639
๐Ÿš€ Delighted to share a major milestone in adapting reasoning techniques for data collections augmentation!
Introducing bulk-chain 1.0.0 -- the first major release of a no-string API for adapting your LLM for Chain-of-Thought alike reasoning over records with large amount of parameters across large datasets.

โญ Check it out: https://github.com/nicolay-r/bulk-chain

Whatโ€™s new and why it matters:
๐Ÿ“ฆ Fully no-string API for easy client deployment
๐Ÿ”ฅ Demos are now standalone projects:

Demos:
๐Ÿ“บ bash / shell (dispatched): https://github.com/nicolay-r/bulk-chain-shell
๐Ÿ“บ tksheet: https://github.com/nicolay-r/bulk-chain-tksheet-client

Using nlp-thirdgate to host the supported providers:
๐ŸŒŒ LLM providers: https://github.com/nicolay-r/nlp-thirdgate
reacted to danielhanchen's post with ๐Ÿ”ฅ 17 days ago
view post
Post
5746
๐Ÿฆฅ Introducing Unsloth Dynamic v2.0 GGUFs!
Our v2.0 quants set new benchmarks on 5-shot MMLU and KL Divergence, meaning you can now run & fine-tune quantized LLMs while preserving as much accuracy as possible.

Llama 4: unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF
DeepSeek-R1: unsloth/DeepSeek-R1-GGUF-UD
Gemma 3: unsloth/gemma-3-27b-it-GGUF

We made selective layer quantization much smarter. Instead of modifying only a subset of layers, we now dynamically quantize all layers so every layer has a different bit. Now, our dynamic method can be applied to all LLM architectures, not just MoE's.

Blog with Details: https://docs.unsloth.ai/basics/dynamic-v2.0

All our future GGUF uploads will leverage Dynamic 2.0 and our hand curated 300Kโ€“1.5M token calibration dataset to improve conversational chat performance.

For accurate benchmarking, we built an evaluation framework to match the reported 5-shot MMLU scores of Llama 4 and Gemma 3. This allowed apples-to-apples comparisons between full-precision vs. Dynamic v2.0, QAT and standard iMatrix quants.

Dynamic v2.0 aims to minimize the performance gap between full-precision models and their quantized counterparts.
reacted to julien-c's post with ๐Ÿ”ฅ 17 days ago
view post
Post
4158
BOOOOM: Today I'm dropping TINY AGENTS

the 50 lines of code Agent in Javascript ๐Ÿ”ฅ

I spent the last few weeks working on this, so I hope you will like it.

I've been diving into MCP (Model Context Protocol) to understand what the hype was all about.

It is fairly simple, but still quite powerful: MCP is a standard API to expose sets of Tools that can be hooked to LLMs.

But while doing that, came my second realization:

Once you have a MCP Client, an Agent is literally just a while loop on top of it. ๐Ÿคฏ

โžก๏ธ read it exclusively on the official HF blog: https://huggingface.co/blog/tiny-agents
  • 1 reply
ยท
reacted to merve's post with ๐Ÿ”ฅ 18 days ago
view post
Post
3977
Don't sleep on new AI at Meta Vision-Language release! ๐Ÿ”ฅ

facebook/perception-encoder-67f977c9a65ca5895a7f6ba1
facebook/perception-lm-67f9783f171948c383ee7498

Meta dropped swiss army knives for vision with A2.0 license ๐Ÿ‘
> image/video encoders for vision language modelling and spatial understanding (object detection etc) ๐Ÿ‘
> The vision LM outperforms InternVL3 and Qwen2.5VL ๐Ÿ‘
> They also release gigantic video and image datasets

The authors attempt to come up with single versatile vision encoder to align on diverse set of tasks.

They trained Perception Encoder (PE) Core: a new state-of-the-art family of vision encoders that can be aligned for both vision-language and spatial tasks. For zero-shot image tasks, it outperforms latest sota SigLIP2 ๐Ÿ‘



> Among fine-tuned ones, first one is PE-Spatial. It's a model to detect bounding boxes, segmentation, depth estimation and it outperforms all other models ๐Ÿ˜ฎ



> Second one is PLM, Perception Language Model, where they combine PE-Core with Qwen2.5 LM 7B. it outperforms all other models (including InternVL3 which was trained with Qwen2.5LM too!)

The authors release the following checkpoints in sizes base, large and giant:

> 3 PE-Core checkpoints (224, 336, 448)
> 2 PE-Lang checkpoints (L, G)
> One PE-Spatial (G, 448)
> 3 PLM (1B, 3B, 8B)
> Datasets



Authors release following datasets ๐Ÿ“‘
> PE Video: Gigantic video datasete of 1M videos with 120k expert annotations โฏ๏ธ
> PLM-Video and PLM-Image: Human and auto-annotated image and video datasets on region-based tasks
> PLM-VideoBench: New video benchmark on MCQA
  • 2 replies
ยท
posted an update 18 days ago
view post
Post
317
After months of experimentation, I'm excited to share Aurea - a novel adaptive Spatial-Range attention mechanism that approaches multimodal fusion from a fundamentally different angle.

Most vision-language models use a single vision encoder followed by simple projection layers, creating a bottleneck that forces rich visual information through a single representational "funnel" before language integration.

What if we could integrate multiple visual perspectives throughout the modeling process?

The key innovation in Aurea isn't just using multiple encoders (DINOv2 + SigLIP2) - it's how we fuse them. The spatial-range attention mechanism preserves both spatial relationships and semantic information.

This dual awareness allows for richer representations which can be used for any downstream tasks. For instance, Aurea can better understand relational positions between objects, fine-grained details, and complex spatial hierarchies.

I've integrated Aurea into a language model (Phi-4 Mini) via basic pre-training and instruction-tuning. Everything is available - code, weights, and documentation. The CUDA implementation is particularly interesting if you enjoy high-performance computing.

I'd love to see what the community builds with this foundation and would appreciate your feedback. Whether you're interested in theoretical aspects of multimodal fusion or practical applications, there's something in Aurea for you.
  • 2 replies
ยท