Nicolay Rusnachenko

nicolay-r

AI & ML interests

Information Retrieval・Medical Multimodal NLP (🖼+📝) Research Fellow @BU_Research・software developer http://arekit.io・PhD in NLP

Recent Activity

posted an update 1 day ago

📢 For those who in textual IR and experimenting with quick deployment of CoT / reasoning, the following update might be relevant. I am happy to announce new version of the bulk-chain 0.25.3. It is a no-string framework for quick application of reasoning schema adaptation over your data. https://github.com/nicolay-r/bulk-chain/releases/tag/0.25.3 The latest release brings huge updates on: ✅ Reforged mechanism of models inference that work in steraming mode. - Callbacks support for streaming mode (earlier only in demo) - Deployment of various clients (shell, tksheet; see attachment) ✅ Support for batching (earlier in API mode only) ✅ Optional caching of inferred data in SQlite (always enabled earlier) - This now makes possible to faster launch small (but mighty) LLMs 🌟 Project: https://github.com/nicolay-r/bulk-chain 🌌 Proviers: https://github.com/nicolay-r/nlp-thirdgate

liked a model 15 days ago

NX-AI/xLSTM-7b

posted an update 15 days ago

The Concept behind xLSTM has recently turn into the xLSTM-7B model that showcase the performance in the category of the similar-scale Gemma 7B, LLama2 7B, FlaconMamba 7B but with higher performing Inference Kernel Model: https://huggingface.co/NX-AI/xLSTM-7b Paper: https://arxiv.org/abs/2503.13427

View all activity

Organizations

None yet

nicolay-r's activity

posted an update 1 day ago

Post

1018

📢 For those who in textual IR and experimenting with quick deployment of CoT / reasoning, the following update might be relevant. I am happy to announce new version of the bulk-chain 0.25.3. It is a no-string framework for quick application of reasoning schema adaptation over your data.

https://github.com/nicolay-r/bulk-chain/releases/tag/0.25.3

The latest release brings huge updates on:
✅ Reforged mechanism of models inference that work in steraming mode.
- Callbacks support for streaming mode (earlier only in demo)
- Deployment of various clients (shell, tksheet; see attachment)
✅ Support for batching (earlier in API mode only)
✅ Optional caching of inferred data in SQlite (always enabled earlier)
- This now makes possible to faster launch small (but mighty) LLMs

🌟 Project: https://github.com/nicolay-r/bulk-chain
🌌 Proviers: https://github.com/nicolay-r/nlp-thirdgate

posted an update 15 days ago

Post

1643

The Concept behind xLSTM has recently turn into the xLSTM-7B model that showcase the performance in the category of the similar-scale Gemma 7B, LLama2 7B, FlaconMamba 7B but with higher performing Inference Kernel

Model: NX-AI/xLSTM-7b
Paper: https://arxiv.org/abs/2503.13427

1 reply

posted an update 21 days ago

Post

661

📢 Several weeks ago Microsoft announced Phi-4. My most-recent list of LLM models have had only wrapper for Phi-2, so it was time to update! With this post, happy to share that Phi-4 wrapper is now available at nlp-thirdgate for adopting Chain-of-Thought reasoning:

🤖 https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/transformers_phi4.py

📒 https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_phi4.py

Findings on adaptation: I was able to reproduce only the pipeline based model launching. This version is for textual llm only. Microsoft also released multimodal Phi-4 which is out of scope of this wrapper.

🌌 nlp-thirdgate: https://lnkd.in/ef-wBnNn

posted an update 22 days ago

Post

1125

📢 Delighted to announce the updated version of the no-string framework for chain-of-thought application over JSONL/CSV data:
https://github.com/nicolay-r/bulk-chain/releases/tag/0.25.2

🔧 Fixes:
- Fixed issues with batching mode
- Fixed problem with parsing and passing args in shell mode

⚠️ Limitation: bathing mode is still available only via API.

📒 Quick Start with Gemma-3 in batching mode: https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_gemma_3.ipynb

replied to their post 22 days ago

The important comment is to use the very latest version of the bulk-chain from github which fixes the bug for double-inference in batching.

posted an update 24 days ago

Post

1570

📢 With the recent release of Gemma-3, If you interested to play with textual chain-of-though, the notebook below is a wrapper over the the model (native transformers inference API) for passing the predefined schema of promps in batching mode.
https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_gemma_3.ipynb

Limitation: schema supports texts only (for now), while gemma-3 is a text+image to text.

Model: google/gemma-3-1b-it
Provider: https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/transformers_gemma3.py

1 reply

reacted to onekq's post with 👀 25 days ago

Post

1412

The performance of deepseek-r1-distill-qwen-32b is abysmal. I know Qwen instruct (not coder) is quite poor on coding. As such, I have low expectation on other R1 repro works also based on Qwen instruct too. onekq-ai/r1-reproduction-works-67a93f2fb8b21202c9eedf0b

This makes it particularly mysterious what went into QwQ-32B? Why did it work so well? Was it trained from scratch? Anyone has insights about this?
onekq-ai/WebApp1K-models-leaderboard

5 replies

replied to ritvik77's post 25 days ago

@ritvik77 , sounds good on your plans! Meanwhile looking forward to adapt 7B version to experiment in radiology domain. Happy to read more on that and once and if it gets to the paper, so I can populate the survey of the related advances.

replied to ritvik77's post 25 days ago

@ritvik77 , excited to run into this! Is the paper and studies behind it on arxiv or elsewhere?

reacted to ritvik77's post with 🔥 25 days ago

Post

1528

Try it out: ritvik77/Medical_Doctor_AI_LoRA-Mistral-7B-Instruct_FullModel

🩺 Medical Diagnosis AI Model - Powered by Mistral-7B & LoRA 🚀
🔹 Model Overview:
Base Model: Mistral-7B (7.7 billion parameters)
Fine-Tuning Method: LoRA (Low-Rank Adaptation)
Quantization: bnb_4bit (reduces memory footprint while retaining performance)
🔹 Parameter Details:
Original Mistral-7B Parameters: 7.7 billion
LoRA Fine-Tuned Parameters: 4.48% of total model parameters (340 million) Final Merged Model Size (bnb_4bit Quantized): ~4.5GB

This can help you in making a AI agent for healthcare, if you need to finetune it for JSON function/tool calling format you can use some medical function calling dataset to again fine fine tine on it.

3 replies

reacted to clem's post with ❤️ 26 days ago

Post

4680

10,000+ models based on Deepseek R1 have been publicly shared on Hugging Face! Which ones are your favorite ones: https://huggingface.co/models?sort=trending&search=r1. Truly game-changer!

reacted to Jaward's post with 🔥👀 26 days ago

Post

2582

Lightweight (nanoGPT) implementation of hybrid norm - an intuitive normalization method that combines the strength of both pre-norm (i.e QKV-norm in MHA) and post-norm in the feed-forward network.
Code: https://github.com/Jaykef/ai-algorithms/blob/main/hybrid_normalization.ipynb

replied to ychen's post 28 days ago

@ychen , I see. I was expecting your findings were a part of the phd program. Take your time with publications then, since it is common while at Phd. It would be great to have a paper during the masters and all the best with it!

replied to ychen's post 28 days ago

@ychen Good luck with your studies and pleased for affecting on your advances in it. Are you on google scholar or github with personal advances in this domain?

posted an update 28 days ago

Post

3994

📢 For those who interested in quick extraction of emotion causes in dialogues, below is a notebook that adopts the pre-trained Flan-T5 model on FRIENDS dataset powered by bulk-chain framework:

https://gist.github.com/nicolay-r/c8cfe7df1bef0c14f77760fa78ae5b5c

Why it might be intersted to check? The provided supports batching mode for a quck inference. In the case of Flan-T5-base that would be the quickest option via LLM.

📊 Evaluation results are available in model card:
nicolay-r/flan-t5-emotion-cause-thor-base

posted an update about 1 month ago

Post

653

📢 Being inspired by effective LLM usage, delighted to share an approach that might boost your reasonging process 🧠 Sharing the demo for interactive launch of Chain-of-Thoght (CoT) schema in bash with the support of [optionally] predefined parameters as input files. The demo demonstrates application for author sentiment extraction towards object in text.

This is a part of the most recent release of the bulk-chain 0.25.0.
⭐ https://github.com/nicolay-r/bulk-chain/releases/tag/0.25.1

How it works: it launches your CoT by asking missed parameters if necessary. For each item of the chain you receive input prompt and streamed output of your LLM.

To settle onto certain parameters, you can pass them via --src:
- TXT files (using filename as a parameter name)
- JSON dictionaries for multiple

🤖 Model: meta-llama/Llama-3.3-70B-Instruct
🌌 Other models: https://github.com/nicolay-r/nlp-thirdgate

replied to ychen's post about 1 month ago

And to clarify your findings on those words you can measure such degree with tf-idf application for your annotated texts. Basically, if you have a set of positive and negative responses from GPT-4o, you can calculate so-called Semantic Orientation (SO) based on Pointwise Mutual Information (PMI). This would give a consistecy to your observations.
This comes from the relatively old classics: https://arxiv.org/pdf/cs/0212032

replied to ychen's post about 1 month ago

Oh, that sound interesting and looks like your focus are patients then, while mine majorly was mass-media (authors) and dialogues (character conversations).

To make sure I understood you correctly frames are basically describing how a sentiment is related to entities in a sentence—is this a roughly correct understanding?

That's right, so it acts as a word that connects several parties (including entities), that are scientifically declared as "roles" with the polarity score ("positive", "negative"). So that in your case "sounds like", "rough", "tough" could be treated as negative by GPT-4o with respect to the topic of the question.

As for the frames, here is might be more general definition you might be interested to check (see diagram):
https://aclanthology.org/D18-2008.pdf
The concept is the same, while and instead of words they refer to them as triggers.

replied to ychen's post about 1 month ago

Thank you @ychen for sharing this! I was curious, because the word freq analysis you're attempted to do is very aligned with lexicons construction and frames in the domain of sentiment analysis. In particular, this could be enhanced up to analysis on a specific set of words, usually dubbed as frames. So and unlike just words, frames goes further with sentiment of subject towards objects.

FYI. We cover the similar for news and domain specific (Russian language) here: https://github.com/nicolay-r/RuSentiFrames