๐ Big news! NeuroBLAST, the outstanding new architecture, has officially arrived on HF! After three intense months of training my 1.9 billion SLM on my trusty RTX 3090 Ti, Iโm happy to announce the results. While itโs not perfect just yet, Iโve dedicated countless hours to optimizing costs while crafting clever layer connections that mimic the brain's centers. Plus, Iโve introduced a new memory-like layer thatโs sure to turn heads! I canโt wait to dive deep into this journey in my upcoming blog post. Stay tuned for the full scoop! ๐ฅ
๐งฌ Breaking news in Clinical AI: Introducing the OpenMed NER Model Discovery App on Hugging Face ๐ฌ
OpenMed is back! ๐ฅ Finding the right biomedical NER model just became as precise as a PCR assay!
I'm thrilled to unveil my comprehensive OpenMed Named Entity Recognition Model Discovery App that puts 384 specialized biomedical AI models at your fingertips.
๐ฏ Why This Matters in Healthcare AI: Traditional clinical text mining required hours of manual model evaluation. My Discovery App instantly connects researchers, clinicians, and data scientists with the exact NER models they need for their biomedical entity extraction tasks.
๐ฌ What You Can Discover: โ Pharmacological Models - Extract "chemical compounds", "drug interactions", and "pharmaceutical" entities from clinical notes โ Genomics & Proteomics - Identify "DNA sequences", "RNA transcripts", "gene variants", "protein complexes", and "cell lines" โ Pathology & Disease Detection - Recognize "pathological formations", "cancer types", and "disease entities" in medical literature โ Anatomical Recognition - Map "anatomical systems", "tissue types", "organ structures", and "cellular components" โ Clinical Entity Extraction - Detect "organism species", "amino acids", 'protein families", and "multi-tissue structures"
๐ก Advanced Features: ๐ Intelligent Entity Search - Find models by specific biomedical entities (e.g., "Show me models detecting CHEM + DNA + Protein") ๐ฅ Domain-Specific Filtering - Browse by Oncology, Pharmacology, Genomics, Pathology, Hematology, and more ๐ Model Architecture Insights - Compare BERT, RoBERTa, and DeBERTa implementations โก Real-Time Search - Auto-filtering as you type, no search buttons needed ๐จ Clinical-Grade UI - Beautiful, intuitive interface designed for medical professionals
Ready to revolutionize your biomedical NLP pipeline?
๐ Try it now: OpenMed/openmed-ner-models ๐งฌ Built with: Gradio, Transformers, Advanced Entity Mapping
โ๏ธ modification of the cross-entropy loss function designed specifically for training LLMs. โ๏ธ twist on the standard cross-entropy loss by emphasizing the importance of outlier prediction errors and dynamically normalizing token-level variance. โ๏ธ more stable and efficient training, leading to models that generalize better.
Check it out, give it a spin, and let me know what you think!
Licensed under the Apache 2.0 license and ready to use. Happy training! ๐ฅ๐ค
After hours of working with GitHub Copilot to organize the code, I'm keen to announce the release of Blurred Thoughts Supervised-Finetuning (BT-SFT), a new method for fine-tuning LLMs to produce more diverse and creative responses.
BT-SFT introduces: โ Smart tokenization method randomly masks tokens within <think> ... </think> tags, promoting the model to generate diverse responses that align better with its probability distribution instead of memorizing the thought process from distilled data. โ Reward function that ensures responses are well-structured.
Can we teach a model to think completely on its own without reinforcement learning? Actually, yes.
We can do straightforward supervised fine-tuning using a relatively simple trick: blurring a part of CoT thoughts. But why is this effective?
We observed that various models differ in their thinking processes, and fine-tuning one model on another modelโs thoughts (CoT) can sometimes be inefficientโoften resulting in the model simply memorizing reasoning rather than learning how to actually think.
I discovered that this process can still be efficient if we clearly indicate when the model should start and stop thinking and uncover only a part of CoT and the expected answer, blurring the other part of CoT. This approach allows the model to learn only a portion of the thought process while still arriving at an expected answer.
To demonstrate this, you can watch my experimental BT-SFT on meditsolutions/Llama-3.2-SUN-2.5B-chat model, which was fine-tuned on 151 million tokens from the Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B dataset.
Enjoy! ๐
PS. If you were curious enough to read this, leave me a comment. It's always nice to chat with open-minded and intelligent ppl.
Ok, my 14B DeepSeek R1 merge with Qwen2.5 1M is really hot right nowโit's got 2.6k downloads! It's sitting pretty as the top trending model on the third page. ๐ฅ