Krishna Kaasyap

KrishnaKaasyap

AI & ML interests

Test Time Training Multimodal & Inter-Modality Transfer Learning Mechanistic Interpretability Evolutionary Model Merging Swarm Intelligence of multiple models with different architectures and different algorithms MuZero approach to general tasks

Recent Activity

liked a model 25 days ago

Qwen/Qwen3-235B-A22B-Thinking-2507-FP8

liked a model about 1 month ago

moonshotai/Kimi-K2-Base

liked a model about 2 months ago

black-forest-labs/FLUX.1-Kontext-dev

View all activity

Organizations

upvoted a collection about 2 months ago

Gemma 3n Preview

Collection

4 items • Updated Jul 10 • 169

upvoted a collection 2 months ago

MiniMax-M1

Collection

MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. • 6 items • Updated Jul 3 • 110

upvoted a paper 2 months ago

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31, 2024 • 117

upvoted a collection 2 months ago

Nemotron-H

Collection

Mamba-Transformer hybrid models • 10 items • Updated 5 days ago • 29

upvoted a collection 3 months ago

MedGemma Release

Collection

Collection of Gemma 3 variants for performance on medical text and image comprehension to accelerate building healthcare-based AI applications. • 7 items • Updated Jul 11 • 282

upvoted 2 collections 4 months ago

Qwen2.5-Omni

Collection

End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 • 7 items • Updated 29 days ago • 155

Qwen3

Collection

84 items • Updated 13 days ago • 1.11k

upvoted a collection 5 months ago

Llama 4

Collection

Llama 4 release • 13 items • Updated Apr 29 • 607

upvoted an article 8 months ago

Article

🐺🐦‍⬛ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs

•

Dec 4, 2024

• 79

upvoted a collection 8 months ago

QwQ

Collection

Qwen with Questions • 6 items • Updated 29 days ago • 98

upvoted an article 9 months ago

Article

Bridging the Gap Between Physical Numerical Simulations and Machine Learning: Introducing The Well

•

Dec 2, 2024

• 18

upvoted 2 collections 9 months ago

🎬 Video models

Collection

text-to-video & image-to-video models released by the Chinese community • 22 items • Updated 22 days ago • 4

🧠 Reasoning Models

Collection

8 items • Updated 22 days ago • 39

upvoted a collection 10 months ago

Llama-3.1-Nemotron-70B

Collection

SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. • 6 items • Updated 5 days ago • 155

upvoted a paper 12 months ago

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published Aug 22, 2024 • 52

upvoted 2 collections 12 months ago

Jamba 1.5

Collection

The AI21 Jamba family of models are state-of-the-art, hybrid SSM-Transformer instruction following foundation models • 2 items • Updated Mar 6 • 87

Magnum v2 123b

Collection

3 items • Updated Aug 21, 2024 • 6

upvoted a collection about 1 year ago

DeepSeek-V2

Collection

8 items • Updated Jan 3 • 32

upvoted an article about 1 year ago

Article

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

and 7 others •

Jul 23, 2024

• 237

upvoted a paper about 1 year ago

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Paper • 2408.05211 • Published Aug 9, 2024 • 50

Krishna Kaasyap

AI & ML interests

Recent Activity

Organizations

KrishnaKaasyap's activity

🐺🐦‍⬛ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs

Bridging the Gap Between Physical Numerical Simulations and Machine Learning: Introducing The Well

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context