5 5 5

Tirupati Venkata Sri Sai Rama Raju Penmatsa

rajuptvs

AI & ML interests

None yet

Recent Activity

liked a Space 22 days ago

AdithyaSK/rl-environments-guide

upvoted an article 4 months ago

LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family

commentedon an article 8 months ago

How I Trained Action Chunking Transformer (ACT) on SO-101: My Journey, Gotchas, and Lessons

View all activity

Organizations

liked a Space 22 days ago

The ultimate guide to RL environments: building and scaling them in the LLM era

📝

178

Building and scaling RL environments for LLM training

upvoted an article 4 months ago

Article

LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family

lightonai

•

Jan 19

• 94

commented on How I Trained Action Chunking Transformer (ACT) on SO-101: My Journey, Gotchas, and Lessons 8 months ago

Incredible read!! tempting me just a bit more to get my own So 101 😀 .

upvoted an article 8 months ago

Article

How I Trained Action Chunking Transformer (ACT) on SO-101: My Journey, Gotchas, and Lessons

sherryxychen

•

Sep 30, 2025

• 71

upvoted a collection about 1 year ago

Llama 4

Collection

Llama 4 release • 13 items • Updated Apr 29, 2025 • 737

liked a model about 1 year ago

Kortix/FastApply-1.5B-v1.0

Text Generation • 2B • Updated Dec 5, 2025 • 99 • • 42

published a Space about 1 year ago

HuggingFaceTB SmolLM2 360M Instruct

🐠

reacted to Jaward's post with 🔥👀 about 1 year ago

Post

3746

Implemented a custom multimodal GRPO trainer that scales for Small VLMs, supports cpu and gpu with vllm + flash attention. Using SmolVLM-256M-Instruct reference & reward model, wasn’t trained for long btw, still got some sparks of “thinking”:)
Code: https://github.com/Jaykef/ai-algorithms/blob/main/grpo_multimodal_reasoner.ipynb

1 reply

liked a model about 1 year ago

Qwen/Qwen2.5-VL-32B-Instruct

Image-Text-to-Text • 33B • Updated Apr 14, 2025 • 384k • 488

updated a collection about 1 year ago

Multimodal

Collection

My Collection of models that I want to checkout • 1 item • Updated Mar 5, 2025

liked a model about 1 year ago

microsoft/Phi-4-multimodal-instruct

Automatic Speech Recognition • 6B • Updated Dec 10, 2025 • 531k • 1.6k

reacted to merve's post with 🔥 over 1 year ago

Post

2841

What a week! A recap for everything you missed ❄️
merve/nov-22-releases-673fbbcfc1c97c4f411def07
Multimodal ✨
> Mistral AI
released Pixtral 124B, a gigantic open vision language model
> Llava-CoT (formerly known as Llava-o1) was released, a multimodal reproduction of o1 model by PKU
> OpenGVLab released MMPR: a new multimodal reasoning dataset
> Jina has released Jina-CLIP-v2 0.98B multilingual multimodal embeddings
> Apple released new SotA vision encoders AIMv2

LLMs 🦙
> AllenAI dropped a huge release of models, datasets and scripts for Tülu, a family of models based on Llama 3.1 aligned with SFT, DPO and a new technique they have developed called RLVR
> Jina has released embeddings-v3: new multilingual embeddings with longer context
> Hugging Face released SmolTalk: synthetic dataset used to align SmolLM2 using supervised fine-tuning
> Microsoft released orca-agentinstruct-1M-v1: a gigantic instruction dataset of 1M synthetic instruction pairs

Image Generation 🖼️
> Black Forest Labs released Flux 1. tools: four new models for different image modifications and two LoRAs to do image conditioning and better steer generations

Lastly Hugging Face released a new library Observers: a lightweight SDK for monitoring interactions with AI APIs and easily store and browse them 📚
$ pip install observers