Ruslan
uzvisa
AI & ML interests
None yet
Recent Activity
liked
a model
2 days ago
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
reacted
to
seawolf2357's
post
with 👍
3 days ago
🚀 Just Found an Interesting New Leaderboard for Medical AI Evaluation!
I recently stumbled upon a medical domain-specific FACTS Grounding leaderboard on Hugging Face, and the approach to evaluating AI accuracy in medical contexts is quite impressive, so I thought I'd share.
📊 What is FACTS Grounding?
It's originally a benchmark developed by Google DeepMind that measures how well LLMs generate answers based solely on provided documents. What's cool about this medical-focused version is that it's designed to test even small open-source models.
🏥 Medical Domain Version Features
236 medical examples: Extracted from the original 860 examples
Tests small models like Qwen 3 1.7B: Great for resource-constrained environments
Uses Gemini 1.5 Flash for evaluation: Simplified to a single judge model
📈 The Evaluation Method is Pretty Neat
Grounding Score: Are all claims in the response supported by the provided document?
Quality Score: Does it properly answer the user's question?
Combined Score: Did it pass both checks?
Since medical information requires extreme accuracy, this thorough verification approach makes a lot of sense.
🔗 Check It Out Yourself
The actual leaderboard: https://huggingface.co/spaces/MaziyarPanahi/FACTS-Leaderboard
💭 My thoughts: As medical AI continues to evolve, evaluation tools like this are becoming increasingly important. The fact that it can test smaller models is particularly helpful for the open-source community!
Organizations
None yet
uzvisa's activity
Отличная модель!
#2 opened 25 days ago
by
uzvisa

Improved Jinja Chat Template for Apriel-Nemotron-15b-Thinker GGUF (e.g., for LM Studio)
👍
1
10
#1 opened about 1 month ago
by
debeast6

I’m excited to hear any updates from the DeepCogito Team!
1
#1 opened about 1 month ago
by
uzvisa

Какие данные в этой LLM
#1 opened about 2 months ago
by
uzvisa


New activity in
RichardErkhov/micks99_-_gemma-2b-instruct-ft-Data-Analytics-Digital-Marketing-Project-Management-QAv2-gguf
3 months ago
Для чего нужна эта модель?
8
#1 opened 3 months ago
by
uzvisa

Для чего нужна эта модель?
8
#1 opened 3 months ago
by
uzvisa

Для чего нужна эта модель?
8
#1 opened 3 months ago
by
uzvisa
