๐ Just Found an Interesting New Leaderboard for Medical AI Evaluation!
I recently stumbled upon a medical domain-specific FACTS Grounding leaderboard on Hugging Face, and the approach to evaluating AI accuracy in medical contexts is quite impressive, so I thought I'd share.
๐ What is FACTS Grounding? It's originally a benchmark developed by Google DeepMind that measures how well LLMs generate answers based solely on provided documents. What's cool about this medical-focused version is that it's designed to test even small open-source models.
๐ฅ Medical Domain Version Features
236 medical examples: Extracted from the original 860 examples Tests small models like Qwen 3 1.7B: Great for resource-constrained environments Uses Gemini 1.5 Flash for evaluation: Simplified to a single judge model
๐ The Evaluation Method is Pretty Neat
Grounding Score: Are all claims in the response supported by the provided document? Quality Score: Does it properly answer the user's question? Combined Score: Did it pass both checks?
Since medical information requires extreme accuracy, this thorough verification approach makes a lot of sense. ๐ Check It Out Yourself
๐ญ My thoughts: As medical AI continues to evolve, evaluation tools like this are becoming increasingly important. The fact that it can test smaller models is particularly helpful for the open-source community!
๐ฏ Core Features 15 Expert Theories for professional brand naming Bilingual Support Korean/English for global brands Unified Evaluation System creativity/memorability/relevance scores Real-time Visualization theory-specific custom designs
๐ฌ Applied Theories Cognitive Theories (4) ๐ฆ Square Theory - Semantic square structure with 4-word relationships ๐ Sound Symbolism - Psychological connections between phonemes and meaning ๐ง Cognitive Load - Minimized processing for instant recognition ๐๏ธ Gestalt Theory - Perceptual principles where whole exceeds parts
Creative Theories (3) ๐ Conceptual Blending - Merging concepts to create new meanings ๐ง SCAMPER Method - 7 creative transformation techniques ๐ฟ Biomimicry - Nature-inspired wisdom from 3.8 billion years of evolution
Cultural Theories (3) ๐ญ Jung's Archetype - 12 universal archetypes for emotional connection ๐ Linguistic Relativity - Cross-cultural thinking patterns consideration ๐งฌ Memetics - Cultural transmission and evolutionary potential
Differentiation Theories (3) โก Von Restorff Effect - Uniqueness for 30x better recall ๐จ Color Psychology - Emotional associations and color meanings ๐ Network Effects - Value maximization through network structures
๐ซ Special Features Each theory provides unique visualizations and customized analysis:
Square Theory โ 4-corner relationship diagram Blending โ Concept fusion flowchart Color โ Interactive color palette display Theory-specific insights for each approach
Collection of 178,604 Public Domain Scalable Vector Graphics (SVG) clipart images featuring: - Comprehensive metadata: title, description, artist name, tags, original page URL, and more. - Contains complete SVG XML content (minified) for direct use or processing. - All images explicitly released into the public domain under the CC0 license. - Organized in a single train split with 178,604 entries.
reacted to merterbak's
post with ๐ฅabout 1 month ago
โ Pre-trained 119 languages(36 trillion tokens) and dialects with strong translation and instruction following abilities. (Qwen2.5 was pre-trained on 18 trillion tokens.) โ Qwen3 dense models match the performance of larger Qwen2.5 models. For example, Qwen3-1.7B/4B/8B/14B/32B perform like Qwen2.5-3B/7B/14B/32B/72B. โ Three stage done while pretraining: โข Stage 1: General language learning and knowledge building. โข Stage 2: Reasoning boost with STEM, coding, and logic skills. โข Stage 3: Long context training โ It supports MCP in the model โ Strong agent skills โ Supports seamless between thinking mode (for hard tasks like math and coding) and non-thinking mode (for fast chatting) inside chat template. โ Better human alignment for creative writing, roleplay, multi-turn conversations, and following detailed instructions.
reacted to Kseniase's
post with ๐about 2 months ago
RAG is evolving fast, keeping pace with cutting-edge AI trends. Today it becomes more agentic and smarter at navigating complex structures like hypergraphs.