Nicolay Rusnachenko's picture

Nicolay Rusnachenko

nicolay-r

AI & ML interests

Information RetrievalใƒปMedical Multimodal NLP (๐Ÿ–ผ+๐Ÿ“) Research Fellow @BU_Researchใƒปsoftware developer http://arekit.ioใƒปPhD in NLP

Recent Activity

reacted to singhsidhukuldeep's post with ๐Ÿง  about 13 hours ago
Exciting News in AI: JinaAI Releases JINA-CLIP-v2! The team at Jina AI has just released a groundbreaking multilingual multimodal embedding model that's pushing the boundaries of text-image understanding. Here's why this is a big deal: ๐Ÿš€ Technical Highlights: - Dual encoder architecture combining a 561M parameter Jina XLM-RoBERTa text encoder and a 304M parameter EVA02-L14 vision encoder - Supports 89 languages with 8,192 token context length - Processes images up to 512ร—512 pixels with 14ร—14 patch size - Implements FlashAttention2 for text and xFormers for vision processing - Uses Matryoshka Representation Learning for efficient vector storage โšก๏ธ Under The Hood: - Multi-stage training process with progressive resolution scaling (224โ†’384โ†’512) - Contrastive learning using InfoNCE loss in both directions - Trained on massive multilingual dataset including 400M English and 400M multilingual image-caption pairs - Incorporates specialized datasets for document understanding, scientific graphs, and infographics - Uses hard negative mining with 7 negatives per positive sample ๐Ÿ“Š Performance: - Outperforms previous models on visual document retrieval (52.65% nDCG@5) - Achieves 89.73% image-to-text and 79.09% text-to-image retrieval on CLIP benchmark - Strong multilingual performance across 30 languages - Maintains performance even with 75% dimension reduction (256D vs 1024D) ๐ŸŽฏ Key Innovation: The model solves the long-standing challenge of unifying text-only and multi-modal retrieval systems while adding robust multilingual support. Perfect for building cross-lingual visual search systems! Kudos to the research team at Jina AI for this impressive advancement in multimodal AI!
View all activity

Organizations

None yet

nicolay-r's activity

reacted to DawnC's post with โค๏ธ about 13 hours ago
view post
Post
666
๐ŸŒŸ PawMatchAI: Making Breed Selection More Intuitive! ๐Ÿ•
Excited to share the latest update to this AI-powered companion for finding your perfect furry friend! The breed recommendation system just got a visual upgrade to help you make better decisions.

โœจ What's New?
Enhanced breed recognition accuracy through strategic model improvements:
- Upgraded to a fine-tuned ConvNeXt architecture for superior feature extraction
- Implemented progressive layer unfreezing during training
- Optimized data augmentation pipeline for better generalization
- Achieved 8% improvement in breed classification accuracy

๐ŸŽฏ Key Features:
- Smart breed recognition powered by AI
- Visual matching scores with intuitive color indicators
- Detailed breed comparisons with interactive tooltips
- Lifestyle-based recommendations tailored to your needs

๐Ÿ’ญ Project Vision
Combining my passion for AI and pets, this project represents another step toward my goal of creating meaningful AI applications. Each update aims to make the breed selection process more accessible while improving the underlying technology.

๐Ÿ‘‰ Try it now: DawnC/PawMatchAI

Your likes โค๏ธ on this space fuel this project's growth!

#AI #MachineLearning #DeepLearning #Pytorch #ComputerVision
See translation
reacted to sayakpaul's post with ๐Ÿš€ about 13 hours ago
view post
Post
1519
Commits speak louder than words ๐Ÿคช

* 4 new video models
* Multiple image models, including SANA & Flux Control
* New quantizers -> GGUF & TorchAO
* New training scripts

Enjoy this holiday-special Diffusers release ๐Ÿค—
Notes: https://github.com/huggingface/diffusers/releases/tag/v0.32.0
reacted to singhsidhukuldeep's post with ๐Ÿง  about 13 hours ago
view post
Post
1134
Exciting News in AI: JinaAI Releases JINA-CLIP-v2!

The team at Jina AI has just released a groundbreaking multilingual multimodal embedding model that's pushing the boundaries of text-image understanding. Here's why this is a big deal:

๐Ÿš€ Technical Highlights:
- Dual encoder architecture combining a 561M parameter Jina XLM-RoBERTa text encoder and a 304M parameter EVA02-L14 vision encoder
- Supports 89 languages with 8,192 token context length
- Processes images up to 512ร—512 pixels with 14ร—14 patch size
- Implements FlashAttention2 for text and xFormers for vision processing
- Uses Matryoshka Representation Learning for efficient vector storage

โšก๏ธ Under The Hood:
- Multi-stage training process with progressive resolution scaling (224โ†’384โ†’512)
- Contrastive learning using InfoNCE loss in both directions
- Trained on massive multilingual dataset including 400M English and 400M multilingual image-caption pairs
- Incorporates specialized datasets for document understanding, scientific graphs, and infographics
- Uses hard negative mining with 7 negatives per positive sample

๐Ÿ“Š Performance:
- Outperforms previous models on visual document retrieval (52.65% nDCG@5)
- Achieves 89.73% image-to-text and 79.09% text-to-image retrieval on CLIP benchmark
- Strong multilingual performance across 30 languages
- Maintains performance even with 75% dimension reduction (256D vs 1024D)

๐ŸŽฏ Key Innovation:
The model solves the long-standing challenge of unifying text-only and multi-modal retrieval systems while adding robust multilingual support. Perfect for building cross-lingual visual search systems!

Kudos to the research team at Jina AI for this impressive advancement in multimodal AI!
reacted to ginipick's post with ๐Ÿ”ฅ about 13 hours ago
view post
Post
1279
๐ŸŽจ GiniGen Canvas-o3: Intelligent AI-Powered Image Editing Platform
Transform your images with precision using our next-generation tool that lets you extract anything from text to objects with simple natural language commands! ๐Ÿš€
๐Ÿ“Œ Key Differentiators:

Intelligent Object Recognition & Extraction
โ€ข Freedom to select any target (text, logos, objects)
โ€ข Simple extraction via natural language commands ("dog", "signboard", "text")
โ€ข Ultra-precise segmentation powered by GroundingDINO + SAM
Advanced Background Processing
โ€ข AI-generated custom backgrounds for extracted objects
โ€ข Intuitive object size/position adjustment
โ€ข Multiple aspect ratio support (1:1, 16:9, 9:16, 4:3)
Progressive Text Integration
โ€ข Dual text placement: over or behind images
โ€ข Multi-language font support
โ€ข Real-time font style/size/color/opacity adjustment

๐ŸŽฏ Use Cases:

Extract logos from product images
Isolate text from signboards
Select specific objects from scenes
Combine extracted objects with new backgrounds
Layer text in front of or behind images

๐Ÿ’ซ Technical Features:

Natural language-based object detection
Real-time image processing
GPU acceleration & memory optimization
User-friendly interface

๐ŸŽ‰ Key Benefits:

User Simplicity: Natural language commands for object extraction
High Precision: AI-powered accurate object recognition
Versatility: From basic editing to advanced content creation
Real-Time Processing: Instant result visualization

Experience the new paradigm of image editing with GiniGen Canvas-o3:

Seamless integration of multiple editing functions
Professional-grade results with consumer-grade ease
Perfect for social media, e-commerce, and design professionals

Whether you're extracting text from complex backgrounds or creating sophisticated visual content, GiniGen Canvas-o3 provides the precision and flexibility you need for modern image editing!

GO! ginigen/CANVAS-o3
  • 2 replies
ยท
reacted to InferenceIllusionist's post with ๐Ÿ”ฅ 1 day ago
view post
Post
1835
MilkDropLM-32b-v0.3: Unlocking Next-Gen Visuals โœจ

Stoked to release the latest iteration of our MilkDropLM project! This new release is based on the powerful Qwen2.5-Coder-32B-Instruct model using the same great dataset that powered our 7b model.

What's new?

- Genome Unlocked: Deeper understanding of preset relationships for more accurate and creative generations.

- Preset Revival: Breathe new life into old presets with our upgraded model!

- Loop-B-Gone: Say goodbye to pesky loops and hello to smooth generation.

- Natural Chats: Engage in more natural sounding conversations with our LLM than ever before.

Released under Apache 2.0, because sharing is caring!

Try it out: InferenceIllusionist/MilkDropLM-32b-v0.3

Shoutout to @superwatermelon for his invaluable insights and collab, and to all those courageous members in the community that have tested and provided feedback before!
reacted to ehristoforu's post with ๐Ÿค— 1 day ago
view post
Post
2646
โœ’๏ธ Ultraset - all-in-one dataset for SFT training in Alpaca format.
fluently-sets/ultraset

โ“ Ultraset is a comprehensive dataset for training Large Language Models (LLMs) using the SFT (instruction-based Fine-Tuning) method. This dataset consists of over 785 thousand entries in eight languages, including English, Russian, French, Italian, Spanish, German, Chinese, and Korean.

๐Ÿคฏ Ultraset solves the problem faced by users when selecting an appropriate dataset for LLM training. It combines various types of data required to enhance the model's skills in areas such as text writing and editing, mathematics, coding, biology, medicine, finance, and multilingualism.

๐Ÿค— For effective use of the dataset, it is recommended to utilize only the "instruction," "input," and "output" columns and train the model for 1-3 epochs. The dataset does not include DPO or Instruct data, making it suitable for training various types of LLM models.

โ‡๏ธ Ultraset is an excellent tool to improve your language model's skills in diverse knowledge areas.
reacted to aaditya's post with ๐Ÿ”ฅ 2 days ago
view post
Post
3019
Last Week in Medical AI: Top Research Papers/Models ๐Ÿ”ฅ
๐Ÿ… (December 15 โ€“ December 21, 2024)

Medical LLM & Other Models
- MedMax: Mixed-Modal Biomedical Assistant
- Advanced multimodal instruction tuning
- Enhanced biomedical knowledge integration
- Comprehensive assistant capabilities
- MGH Radiology Llama 70B
- Specialized radiology focus
- State-of-the-art performance
- Enhanced report generation capabilities
- HC-LLM: Historical Radiology Reports
- Context-aware report generation
- Historical data integration
- Improved accuracy in diagnostics

Frameworks & Methods
- ReflecTool: Reflection-Aware Clinical Agents
- Process-Supervised Clinical Notes
- Federated Learning with RAG
- Query Pipeline Optimization

Benchmarks & Evaluations
- Multi-OphthaLingua
- Multilingual ophthalmology benchmark
- Focus on LMICs healthcare
- Bias assessment framework
- ACE-M3 Evaluation Framework
- Multimodal medical model testing
- Comprehensive capability assessment
- Standardized evaluation metrics

LLM Applications
- Patient-Friendly Video Reports
- Medical Video QA Systems
- Gene Ontology Annotation
- Healthcare Recommendations

Special Focus: Medical Ethics & AI
- Clinical Trust Impact Study
- Mental Health AI Challenges
- Hospital Monitoring Ethics
- Radiology AI Integration

Now you can watch and listen to the latest Medical AI papers daily on our YouTube and Spotify channels as well!

- Full thread in detail:
https://x.com/OpenlifesciAI/status/1870504774162063760
- Youtube Link: youtu.be/SbFp4fnuxjo
- Spotify: https://t.co/QPmdrXuWP9
reacted to luigi12345's post with ๐Ÿ‘€ 2 days ago
view post
Post
2492
PERFECT FINAL PROMPT for Coding and Debugging.
Step 1: Generate the prompt that if sent to you will make you adjust the script so it meets each and every of the criteria it needs to meet to be 100% bug free and perfect.

Step 2: adjust the script following the steps and instructions in the prompt created in Step 1.

  • 1 reply
ยท
reacted to prithivMLmods's post with ๐Ÿค— 2 days ago
reacted to wenhuach's post with ๐Ÿ‘ 2 days ago
posted an update 2 days ago
view post
Post
2111
๐Ÿ“ข So far I noticed that ๐Ÿง  reasoning with llm ๐Ÿค– in English is tend to be more accurate than in other languages.
However, besides the GoogleTrans and other open transparent translators, I could not find one that could be easy to use solutions to avoid:
1.๐Ÿ”ด Third-party framework installation
2.๐Ÿ”ด Text chunking
3.๐Ÿ”ด support of meta-annotation like spans / objects / etc.

๐Ÿ’Ž To cope problem of IR from non-english texts, I am happy to share the bulk-translate 0.25.0. ๐ŸŽŠ

โญ https://github.com/nicolay-r/bulk-translate

bulk-translate is a tiny Python ๐Ÿ no-string framework that allows translate series of texts with the pre-annotated fixed-spans that are invariant for translator.

It supports ๐Ÿ‘จโ€๐Ÿ’ป API for quick data translation with (optionaly) annotated objects in texts (see figure below) in Python ๐Ÿ
I make it accessible as much as possible for RAG and / or LLM-powered app downstreams:
๐Ÿ“˜ https://github.com/nicolay-r/bulk-translate/wiki

All you have to do is to provide iterator of texts, where each text:
1. โœ… String object
2. โœ… List of strings and nested lists that represent spans (value + any ID data).

๐Ÿค– By default I provide a wrapper over googletrans which you can override with your own ๐Ÿ”ฅ
https://github.com/nicolay-r/bulk-translate/blob/master/models/googletrans_310a.py
posted an update 4 days ago
view post
Post
522
๐Ÿ“ข If you're working in relation extraction / character network domain, then the following post would be relevant.
Excited to share the most recent milestone on releasing the ARElight 0.25.0 ๐ŸŽŠ

Core library: https://github.com/nicolay-r/ARElight
Server: https://github.com/nicolay-r/ARElight-server

๐Ÿ”Ž What is ARElight? It represents Granular Viewer of Sentiments Between Entities in Massively Large Documents and Collections of Texts.
Shortly speaking, it allows to extract contexts with mentioned object pairs for the related prompting / classification.
In the slides below we illsutrate the ARElight appliation for sentiment classification between object pairs in context.

We exploit DeepPavlov NER modes + GoogleTranslate + BERT-based classifier in the demo. The bash script for launching the quick demo illustrates the application of these components.

The new update provide a series of new features:
โœ… SQlite support for storing all the extracted samples
โœ… Support of the enhanced GUI for content investigation.
โœ… Switch to external no-string projects for NER and Translator

Supplementiary materials:
๐Ÿ“œ Paper: https://link.springer.com/chapter/10.1007/978-3-031-56069-9_23
posted an update 11 days ago
view post
Post
1922
๐Ÿ“ขFor those who wish to quick start with reasoning / cot application over rows of tabular data but with minimal dependencies, this post would be valuable.

๐Ÿ”Ž I found that the problem is that given a bulk of Chain-of-Though (CoT) ๐Ÿ”— queries for remotely accessed LLM ๐Ÿค– (like openrouter / Replicate / OpenAI) might result in connection loss which may lead exception ๐Ÿ’ฅ and challenges with generated content restoration.

Here, is where I contribute with the bulk-chain.
โญ https://github.com/nicolay-r/bulk-chain

Currently working on 0.24.3 version, in which I am happy to announce the API for developing your apps that are based on CoT schema declaration in JSON (details in attached images ๐Ÿ“ธ)

All you have to do is:
โœ… 1. Declare CoT-schema in json
โœ… 2. Declare the model or use the preset
โœ… 3. Launch code

One example is to use ReplicateIO provider:
https://github.com/nicolay-r/bulk-chain/blob/master/ext/replicate.py

Each model has a wrapped call for inference in try-catch block
posted an update 18 days ago
view post
Post
454
If you're coming towards Information Retrieval with pre-processing techniques for LLM, this post might be relevant.

Excited to share of releasing a new 0.25.1 version of the AREkit library! ๐ŸŽ‰๐Ÿฅณ๐ŸŽŠ๐ŸŽ

AREkit represent an NLP toolkit of components for deep understanding textual narratives through the extraction of inner relations via various techniqes, including machine learning techniques. This toolkit is helpful if you wish to structure your dataset for IR problem. It allows you to turn your narratives into structured datasets of mentioned relations in sentences (sampling).

In the era of GenAI world, AREkit contributes with no-string NLP pipelines and related elements for building your own NLP workflow with any thirdparty ML / LLM / API you wish.

๐ŸŒŸ https://github.com/nicolay-r/AREkit/releases/tag/v0.25.1-rc

In 0.25.1, the following steps were made towards it:
1. โœ… Native batching support for pipelines
2. ๐Ÿ“ฆ Formed thirdparty projects for several text-preprocessing elements:
bulk-translate with GoogleTranslate or any other you wish: https://github.com/nicolay-r/bulk-translate
bulk-ner for NER with DeepPavlov models or any other you wish: https://github.com/nicolay-r/bulk-ner
bulk-chain for reasoning with any LLM you wish: https://github.com/nicolay-r/bulk-chain
* (soon support for AREkit)
3. โŒ Removed convential neural network related components

๐Ÿ“บ One of the demo is ARElight which repsent a granular viewer / GUI for network-based representation of infromation extracted from narratives:
ARElight: https://github.com/nicolay-r/ARElight
posted an update 25 days ago
view post
Post
671
๐Ÿ“ข If you're aimed at processing spreadsheet data with LLM Chain-of-Thought technique, then this update might be valuable for you ๐Ÿ’Ž

The updated ๐Ÿ“ฆ bulk-chain-0.24.2 which is aimed at iterative processing of CSV/JSONL data with no-string dependencies from third party LLM frameworks is out ๐ŸŽ‰

๐Ÿ“ฆ: https://pypi.org/project/bulk-chain/0.24.2/
๐ŸŒŸ: https://github.com/nicolay-r/bulk-chain
๐Ÿ“˜: https://github.com/nicolay-r/bulk-chain/issues/26

The key feature of bulk-chain is SQLite caching that saves your time โฐ๏ธ and money ๐Ÿ’ต by guarantee no-data-lost on using remote LLM providers such as OpenAI, ReplicateIO, OpenRouter, etc.

๐Ÿ”ง This release has the following updates:
โœ… Now I am using a separater iterator tiny package source-iter
โœ… You can manually setup amount of attempts to continue in case of the lost connection.
โœ… other minor improvements.

Quick start on GoogleColab:
๐Ÿ“™: https://colab.research.google.com/github/nicolay-r/bulk-chain/blob/master/bulk_chain_tutorial.ipynb

#reasoning #bulk #sqlite3 #chainofthought #cot #nlp #pipeline #nostrings #processing #data #dynamic #llm
posted an update about 1 month ago
view post
Post
552
๐Ÿ“ข If you were earlier interested in quick translator application for bunch of texts with spans of fixed parts that tolerant for translation, then this post might be relevant! Delighted to share a bulk_translate -- a framework for automatic texts translation with the pre-anotated fixed spans.

๐Ÿ“ฆ https://pypi.org/project/bulk-translate/
๐ŸŒŸ https://github.com/nicolay-r/bulk-translate

๐Ÿ”‘ Spans allows you to control your objects in texts, so that objects would be tollerant to translator. By default it provides implementation for GoogleTranslate.

bulk_translate features:
โœ… Native Implementation of two translation modes:
- fast-mode: exploits extra chars for grouping text parts into single batch
- accurate: pefroms individual translation of each text part.
โœ… No strings: you're free to adopt any LM / LLM backend.
Support googletrans by default.

The initial release of the project supports fixed spans as text parts wrapped in square brackets [] with non inner space characters.

You can play with your data in CSV here on GoogleColab:
๐Ÿ“’ https://colab.research.google.com/github/nicolay-r/bulk-translate/blob/master/bulk_translate_demo.ipynb

๐Ÿ‘ This project is based on AREkit 0.25.1 pipelines for deployment lm-based workflows:
https://github.com/nicolay-r/AREkit
reacted to merve's post with โค๏ธ about 1 month ago
view post
Post
3100
your hugging face profile now has your recent activities ๐Ÿค—
posted an update about 1 month ago
view post
Post
464
๐Ÿ“ข For those who are interested in extracting information about โœ๏ธ authors from texts, happy to share personal ๐Ÿ“น on Reading Between the lines: adapting ChatGPT-related systems ๐Ÿค– for Implicit Information Retrieval National

Youtube: https://youtu.be/nXClX7EDYbE

๐Ÿ”‘ In this talk, we refer to IIR as such information that is indirectly expressed by โœ๏ธ author / ๐Ÿ‘จ character / patient / any other entity.

๐Ÿ“Š I cover the 1๏ธโƒฃ pre-processing and 2๏ธโƒฃ reasoning techniques, aimed at enhancing gen AI capabilities in IIR. To showcase the effectiveness of the proposed techniques, we experiment with such IIR tasks as Sentiment Analysis, Emotion Extraction / Causes Prediction.

In pictures below, sharing the quick takeaways on the pipeline construction and experiment results ๐Ÿงช

Related paper cards:
๐Ÿ“œ emotion-extraction: https://nicolay-r.github.io/#semeval2024-nicolay
๐Ÿ“œ sentiment-analysis: https://nicolay-r.github.io/#ljom2024

Models:
nicolay-r/flan-t5-tsa-thor-base
nicolay-r/flan-t5-emotion-cause-thor-base


๐Ÿ““ PS: I got a hoppy for advetising HPMoR โœจ ๐Ÿ˜
posted an update about 2 months ago
view post
Post
719
๐Ÿ“ข Have you ever been wondered how specifically Transformers were capable for handling long input contexts?
I got a chance to tackle this through long document texts summarization problem, and delighted to share the related survey and diagram for a quick skimming below:

Preprint ๐Ÿ“ https://nicolay-r.github.io/website/data/preprint-AINL_2023_longt5_summarization.pdf
Springer ๐Ÿ“ https://link.springer.com/article/10.1007/s10958-024-07435-z

๐ŸŽฏ The aim of the survey was the development of the long-document summarizer for mass-media news in Vietnamese language. ๐Ÿ‡ป๐Ÿ‡ณ

Sharing for a quick skimming of the methods performance overview of various LM-based solution across several datasets, covering domain-oriented advances in Vietnamese language (see attached screenshots)

As for solution we consider:
โ˜‘๏ธ 1. Adapt existed google/pegasus-cnn_dailymail for summarizing large dataset for arranging training
โ˜‘๏ธ 2. Tuning google/long-t5-tglobal-large suitable for performing generative summarization.

Implementation details:
๐ŸŒŸ https://github.com/nicolay-r/ViLongT5
(Simplier to go with huggingface rather flaxformer that so far become a legacy engine)
reacted to m-ric's post with ๐Ÿ”ฅ about 2 months ago
view post
Post
2412
> Oasis: First Real-Time Video Game Without a Game Engine! ๐ŸŽฎ

DecartAI & Etched just released Oasis - a fully AI-generated video game running at 20 FPS (frames per second). The model takes keyboard inputs and generates everything - physics, rules, graphics - on the fly, without any game engine.

โšก๏ธ What makes this special? Current text-to-video models (Mochi-1, Sora, Kling) generate about 1 frame every 10-20 seconds (that's the kind of device I had to play LoL back in the day, thus my low rankings). Oasis is 200 times faster, making it the first playable AI-generated game.

โš™๏ธ Under the hood, it uses a vision transformer to encode space and a diffusion model to generate frames. The secret sauce is "dynamic noising" - a technique that keeps the video stable between frames.

Key insights:
โšก๏ธ Generates 20 FPS, vs 0.2 FPS for other DIT-based video models
โ€ฃ The specialized hardware Sohu developed by Etched allows to handle 10x more player than H100

๐ŸŽฎ Features real game mechanics
โ€ฃ Movement, jumping, item management
โ€ฃ Physics and lighting
โ€ฃ Procedurally generated worlds

โš ๏ธ Current limitations
โ€ฃ Blurry graphics at a distance
โ€ฃ Objects sometimes change appearance
โ€ฃ Memory issues in long sessions

Try it yourself, the playable demo is impressive! ๐Ÿ‘‰ https://oasis.decart.ai/welcome
Code ๐Ÿ‘‰ https://github.com/etched-ai/open-oasis
Read it in full ๐Ÿ‘‰ https://oasis-model.github.io/