openmed-community (OpenMed Community)

🚀 Big news! NeuroBLAST, the outstanding new architecture, has officially arrived on HF! After three intense months of training my 1.9 billion SLM on my trusty RTX 3090 Ti, I’m happy to announce the results. While it’s not perfect just yet, I’ve dedicated countless hours to optimizing costs while crafting clever layer connections that mimic the brain's centers. Plus, I’ve introduced a new memory-like layer that’s sure to turn heads! I can’t wait to dive deep into this journey in my upcoming blog post. Stay tuned for the full scoop! 🔥

meditsolutions/NeuroBLAST-1.9B-Instruct-Early-Preview

MaziyarPanahi

posted an update about 1 month ago

Post

8277

🧬 Breaking news in Clinical AI: Introducing the OpenMed NER Model Discovery App on Hugging Face 🔬

OpenMed is back! 🔥 Finding the right biomedical NER model just became as precise as a PCR assay!

I'm thrilled to unveil my comprehensive OpenMed Named Entity Recognition Model Discovery App that puts 384 specialized biomedical AI models at your fingertips.

🎯 Why This Matters in Healthcare AI:
Traditional clinical text mining required hours of manual model evaluation. My Discovery App instantly connects researchers, clinicians, and data scientists with the exact NER models they need for their biomedical entity extraction tasks.

🔬 What You Can Discover:
✅ Pharmacological Models - Extract "chemical compounds", "drug interactions", and "pharmaceutical" entities from clinical notes
✅ Genomics & Proteomics - Identify "DNA sequences", "RNA transcripts", "gene variants", "protein complexes", and "cell lines"
✅ Pathology & Disease Detection - Recognize "pathological formations", "cancer types", and "disease entities" in medical literature
✅ Anatomical Recognition - Map "anatomical systems", "tissue types", "organ structures", and "cellular components"
✅ Clinical Entity Extraction - Detect "organism species", "amino acids", 'protein families", and "multi-tissue structures"

💡 Advanced Features:
🔍 Intelligent Entity Search - Find models by specific biomedical entities (e.g., "Show me models detecting CHEM + DNA + Protein")
🏥 Domain-Specific Filtering - Browse by Oncology, Pharmacology, Genomics, Pathology, Hematology, and more
📊 Model Architecture Insights - Compare BERT, RoBERTa, and DeBERTa implementations
⚡ Real-Time Search - Auto-filtering as you type, no search buttons needed
🎨 Clinical-Grade UI - Beautiful, intuitive interface designed for medical professionals

Ready to revolutionize your biomedical NLP pipeline?

🔗 Try it now: OpenMed/openmed-ner-models
🧬 Built with: Gradio, Transformers, Advanced Entity Mapping

5 replies

·

mkurman

posted an update 6 months ago

Post

712

I feel like it's going to take me forever

meditsolutions/medit-one-140M-9B-tokens-checkpoint

mkurman

posted an update 6 months ago

Post

937

Just released NVAMP Loss!

✔️ modification of the cross-entropy loss function designed specifically for training LLMs.
✔️ twist on the standard cross-entropy loss by emphasizing the importance of outlier prediction errors and dynamically normalizing token-level variance.
✔️ more stable and efficient training, leading to models that generalize better.

Check it out, give it a spin, and let me know what you think!

Licensed under the Apache 2.0 license and ready to use. Happy training! 🔥🤖

https://github.com/mkurman/nvamp-loss

mkurman

posted an update 6 months ago

Post

2411

MedIT One 140M Fifth checkpoint after 9B tokens
meditsolutions/medit-one-140M-9B-tokens-checkpoint

mkurman

posted an update 6 months ago

Post

437

Test-time compute (TTC) scaling’s dope. Here’s my spin: Adaptive train-time compute scaling.

https://open.substack.com/pub/mkurman/p/adaptive-train-time-compute-scaling?r=7bzqr

What’s your take? Hit me!

mkurman

posted an update 6 months ago

Post

572

I have uploaded the third pre-training checkpoint after 6 billion tokens to demonstrate that the MedIT One architecture is trainable.

Give it some noise plz! Love u all :D

meditsolutions/medit-one-140M-6B-tokens-checkpoint

mkurman

posted an update 6 months ago

Post

3711

Introducing a new architecture, MedIT One – a single-token transformer with LSTM-like recurrence.

It is extremely fast in training and inference, but we lack funding for large-scale training. Enjoy 🍓

https://github.com/MedITSolutionsKurman/medit-one

mkurman

posted an update 6 months ago

Post

2048

I've been working on something cool: a GRPO with an LLM evaluator that can also perform SFT on the feedback data - if you want. Check it out 😊

Any 🌟are more than welcome 🤗

https://github.com/mkurman/grpo-llm-evaluator

mkurman

posted an update 7 months ago

Post

1595

Blurred-Thoughts Supervised-Finetuning 🙈

After hours of working with GitHub Copilot to organize the code, I'm keen to announce the release of Blurred Thoughts Supervised-Finetuning (BT-SFT), a new method for fine-tuning LLMs to produce more diverse and creative responses.

BT-SFT introduces:
✅ Smart tokenization method randomly masks tokens within <think> ... </think> tags, promoting the model to generate diverse responses that align better with its probability distribution instead of memorizing the thought process from distilled data.
✅ Reward function that ensures responses are well-structured.

Explore and contribute to the project available in my GitHub repository:
https://github.com/mkurman/blurred-thoughts-SFT

Keep me updated on your experiments with BT-SFT! 🐐

mkurman

posted an update 7 months ago

Post

2091

Blurred-Thoughts Supervised Fine-Tuning (BT-SFT) 🤖

Can we teach a model to think completely on its own without reinforcement learning? Actually, yes.

We can do straightforward supervised fine-tuning using a relatively simple trick: blurring a part of CoT thoughts. But why is this effective?

We observed that various models differ in their thinking processes, and fine-tuning one model on another model’s thoughts (CoT) can sometimes be inefficient—often resulting in the model simply memorizing reasoning rather than learning how to actually think.

I discovered that this process can still be efficient if we clearly indicate when the model should start and stop thinking and uncover only a part of CoT and the expected answer, blurring the other part of CoT. This approach allows the model to learn only a portion of the thought process while still arriving at an expected answer.

To demonstrate this, you can watch my experimental BT-SFT on meditsolutions/Llama-3.2-SUN-2.5B-chat model, which was fine-tuned on 151 million tokens from the Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B dataset.

Enjoy! 🚀

PS. If you were curious enough to read this, leave me a comment. It's always nice to chat with open-minded and intelligent ppl.

3 replies

·

mkurman

posted an update 7 months ago

Post

2769

Ok, my 14B DeepSeek R1 merge with Qwen2.5 1M is really hot right now—it's got 2.6k downloads! It's sitting pretty as the top trending model on the third page. 🔥

Check it out if you haven't already!
mkurman/Qwen2.5-14B-DeepSeek-R1-1M

11 replies

·

OpenMed Community

AI & ML interests

Recent Activity

openmed-community/TheBlueScrubs-v1-fixed

[bot] Conversion to Parquet

openmed-community/synthetic-neurology-conversations

README

README

INTELLECT-1 Technical Report

OpenMed NER: Open-Source, Domain-Adapted State-of-the-Art Transformers for Biomedical NER Across 12 Public Datasets

openmed-community/synthetic-neurology-conversations

AI & ML interests

Recent Activity

Team members 2

openmed-community's activity

[bot] Conversion to Parquet

README

README