Linoy Tsaban PRO
linoyts
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
1 minute ago
HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for
Multiple Characters
upvoted
a
paper
1 minute ago
HunyuanPortrait: Implicit Condition Control for Enhanced Portrait
Animation
new activity
4 minutes ago
tencent/HunyuanVideo-Avatar:add pipeline tag
Organizations
linoyts's activity

reacted to
clem's
post with ๐ฅ
1 day ago

reacted to
sayakpaul's
post with ๐ค
5 days ago
Post
2204
Diffusers supports a good variety of quantization backends. It can be challenging to navigate through them, given the complex nature of diffusion pipelines in general.
So, @derekl35 set out to write a comprehensive guide that puts users in the front seat. Explore the different backends we support, learn the trade-offs they offer, and finally, check out the cool space we built that lets you compare quantization results.
Give it a go here:
https://lnkd.in/gf8Pi4-2
So, @derekl35 set out to write a comprehensive guide that puts users in the front seat. Explore the different backends we support, learn the trade-offs they offer, and finally, check out the cool space we built that lets you compare quantization results.
Give it a go here:
https://lnkd.in/gf8Pi4-2

reacted to
sayakpaul's
post with ๐ฅ
6 days ago
Post
1621
Despite the emergence of combining LLM and DiT architectures for T2I synthesis, its design remains severely understudied.
This was done long ago and got into CVPR25 -- super excited to finally share it now, along with the data and code โฅ๏ธ
We explore several architectural choices that affect this design. We provide an open & reproducible training recipe that works at scale.
Works like Playground v3 have already explored a deep fusion between an LLM and a DiT, sharing their representations through layerwise attention. They exhibit excellent performance on T2I.
Despite its compelling results and other performance virtues, it remains unexplored, which is what we want to improve in our work. Specifically, we take a pre-trained LLM (Gemma-2B) and trainable DiT, and set out to explore what makes a "good deep fusion" between the two for T2I.
We explore several key questions in the work, such as:
Q1: How should we do attention? We considered several alternatives. PixArt-Alpha like attention (cross-attention) is very promising.
Q2: Should we incorporate additional text modulation?
Q3: Can we eliminate timestep conditioning?
Q4: How do we do positional encodings?
Q5: Do instruction-tuned LLMs help deep fusion?
Q6: Would using a decoder LLM from a multimodal model be helpful?
Q7: Does using a better variant of Gemma help?
Based on the above findings, we arrive at FuseDiT with the following components on top of the base architecture from the findings of our experiments.
* No AdaLN-Zero modules
* 1D + 2D-RoPE
* Gemma 2 2B, adjusting DiT configurations accordingly
We trained FuseDiT on a mixture from CC12M, JourneyDB, & SA (~26M image-text pairs) for 800 steps. While not the best model, it's encouraging to develop something in a guided manner using open datasets.
To know more (code, models, all are available), please check out the paper:
https://lnkd.in/gg6qyqZX.
This was done long ago and got into CVPR25 -- super excited to finally share it now, along with the data and code โฅ๏ธ
We explore several architectural choices that affect this design. We provide an open & reproducible training recipe that works at scale.
Works like Playground v3 have already explored a deep fusion between an LLM and a DiT, sharing their representations through layerwise attention. They exhibit excellent performance on T2I.
Despite its compelling results and other performance virtues, it remains unexplored, which is what we want to improve in our work. Specifically, we take a pre-trained LLM (Gemma-2B) and trainable DiT, and set out to explore what makes a "good deep fusion" between the two for T2I.
We explore several key questions in the work, such as:
Q1: How should we do attention? We considered several alternatives. PixArt-Alpha like attention (cross-attention) is very promising.
Q2: Should we incorporate additional text modulation?
Q3: Can we eliminate timestep conditioning?
Q4: How do we do positional encodings?
Q5: Do instruction-tuned LLMs help deep fusion?
Q6: Would using a decoder LLM from a multimodal model be helpful?
Q7: Does using a better variant of Gemma help?
Based on the above findings, we arrive at FuseDiT with the following components on top of the base architecture from the findings of our experiments.
* No AdaLN-Zero modules
* 1D + 2D-RoPE
* Gemma 2 2B, adjusting DiT configurations accordingly
We trained FuseDiT on a mixture from CC12M, JourneyDB, & SA (~26M image-text pairs) for 800 steps. While not the best model, it's encouraging to develop something in a guided manner using open datasets.
To know more (code, models, all are available), please check out the paper:
https://lnkd.in/gg6qyqZX.

reacted to
AdinaY's
post with ๐
6 days ago
Post
2692
ByteDance is absolutely cooking lately๐ฅ
BAGEL ๐ฅฏ 7B active parameter open multimodal foundation model by Bytedance Seed team.
ByteDance-Seed/BAGEL-7B-MoT
โจ Apache 2.0
โจ Outperforms top VLMs (Qwen2.5-VL & InternVL-2.5)
โจ Mixture-of-Transformer-Experts + dual encoders
โจ Trained on trillions of interleaved tokens
BAGEL ๐ฅฏ 7B active parameter open multimodal foundation model by Bytedance Seed team.
ByteDance-Seed/BAGEL-7B-MoT
โจ Apache 2.0
โจ Outperforms top VLMs (Qwen2.5-VL & InternVL-2.5)
โจ Mixture-of-Transformer-Experts + dual encoders
โจ Trained on trillions of interleaved tokens

reacted to
loubnabnl's
post with โค๏ธ
9 days ago
Post
2508
SmolVLM is now available on PocketPal โ you can run it offline on your smartphone to interpret the world around you. ๐๐ฑ
And check out this real-time camera demo by @ngxson , powered by llama.cpp:
https://github.com/ngxson/smolvlm-realtime-webcam
https://x.com/pocketpal_ai
And check out this real-time camera demo by @ngxson , powered by llama.cpp:
https://github.com/ngxson/smolvlm-realtime-webcam
https://x.com/pocketpal_ai

reacted to
AdinaY's
post with ๐
14 days ago
Post
2504
Matrix Game ๐ฎ an interactive foundation model for controllable game world generation, released by Skywork AI.
Skywork/Matrix-Game
โจ 17B with MIT licensed
โจ Diffusion-based image-to-world video generation via keyboard & mouse input
โจ GameWorld Score benchmark for Minecraft world models
โจ Massive Matrix Game Dataset with fine-grained action labels
Skywork/Matrix-Game
โจ 17B with MIT licensed
โจ Diffusion-based image-to-world video generation via keyboard & mouse input
โจ GameWorld Score benchmark for Minecraft world models
โจ Massive Matrix Game Dataset with fine-grained action labels

reacted to
merve's
post with ๐ฅ
15 days ago
Post
5008
VLMS 2025 UPDATE ๐ฅ
We just shipped a blog on everything latest on vision language models, including
๐ค GUI agents, agentic VLMs, omni models
๐ multimodal RAG
โฏ๏ธ video LMs
๐ค๐ป smol models
..and more! https://huggingface.co/blog/vlms-2025
We just shipped a blog on everything latest on vision language models, including
๐ค GUI agents, agentic VLMs, omni models
๐ multimodal RAG
โฏ๏ธ video LMs
๐ค๐ป smol models
..and more! https://huggingface.co/blog/vlms-2025

reacted to
AdinaY's
post with ๐
20 days ago
Post
3926
ACE-Step ๐ต a music generation foundation model released by
StepFun & ACEStudio
Model: ACE-Step/ACE-Step-v1-3.5B
Demo: ACE-Step/ACE-Step
โจ 3.5B, Apache2.0 licensed
โจ 115ร faster than LLMs (4-min music in 20s on A100)
โจ Diffusion + DCAE + linear transformer = speed + coherence
โจ Supports voice cloning, remixing, lyric editing & more
StepFun & ACEStudio
Model: ACE-Step/ACE-Step-v1-3.5B
Demo: ACE-Step/ACE-Step
โจ 3.5B, Apache2.0 licensed
โจ 115ร faster than LLMs (4-min music in 20s on A100)
โจ Diffusion + DCAE + linear transformer = speed + coherence
โจ Supports voice cloning, remixing, lyric editing & more

reacted to
RiverZ's
post with ๐ค
21 days ago
Post
6408
๐ฅ We're thrilled to share some exciting news about ICEdit! Currently, ICEdit app (
RiverZ/ICEdit) has soared to the second place on the weekly trend list of Hugging Face Space, just trailing behind Qwen3. What's more, it also holds the second position on the overall space trend list. This achievement wouldn't have been possible without your incredible support and love. A huge thank you to each and every one of youโค!
๐ The ICEdit community has been incredibly active, and we've seen a plethora of amazing ComfyUI workflows being shared. For instance, with the help of ComfyUI - nunchaku, you can run ICEdit locally with just 4GB of VRAM. This makes it much more accessible for those with limited hardware resources.
๐ If you're interested in the detailed information, please head over to our repository. We highly encourage you to give these workflows a try and explore the creative possibilities that ICEdit offers.
Github Repo: https://github.com/River-Zhang/ICEdit
Hugging Face Space: RiverZ/ICEdit
๐ The ICEdit community has been incredibly active, and we've seen a plethora of amazing ComfyUI workflows being shared. For instance, with the help of ComfyUI - nunchaku, you can run ICEdit locally with just 4GB of VRAM. This makes it much more accessible for those with limited hardware resources.
๐ If you're interested in the detailed information, please head over to our repository. We highly encourage you to give these workflows a try and explore the creative possibilities that ICEdit offers.
Github Repo: https://github.com/River-Zhang/ICEdit
Hugging Face Space: RiverZ/ICEdit

reacted to
nyuuzyou's
post with ๐ฅ
21 days ago
Post
3618
๐ผ๏ธ PublicDomainFiles.com Collection -
nyuuzyou/publicdomainfiles
Collection of 206,204 Public Domain multimedia files featuring:
- Comprehensive metadata: title, description, creator name, keywords, original page URL, and more.
- Contains various media types including images, clip art, artwork, fonts, videos, and TV shows.
- All content explicitly released into the public domain under the CC0 license.
- Organized in a single
Collection of 206,204 Public Domain multimedia files featuring:
- Comprehensive metadata: title, description, creator name, keywords, original page URL, and more.
- Contains various media types including images, clip art, artwork, fonts, videos, and TV shows.
- All content explicitly released into the public domain under the CC0 license.
- Organized in a single
train
split with 206,204 entries.

posted
an
update
22 days ago
Post
3014
FramePack is hands down one of the best OS releases in video generation ๐๐ปโโ๏ธ๐คฏ
โ fully open sourced + amazing quality + reduced memory + improved speed
but more even - its gonna facilitate *soooo* many downstream applications
like this version adapted for landscape rotation ๐https://huggingface.co/spaces/tori29umai/FramePack_rotate_landscape
โ fully open sourced + amazing quality + reduced memory + improved speed
but more even - its gonna facilitate *soooo* many downstream applications
like this version adapted for landscape rotation ๐https://huggingface.co/spaces/tori29umai/FramePack_rotate_landscape

reacted to
RiverZ's
post with ๐ฅ
22 days ago
Post
3044
๐ Excited to Share Our Latest Work: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer๏ฝ
๐จ Daily Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)
๐ Code is now open source!
๐ฅ Huggingface DEMO:
RiverZ/ICEdit
๐ Project Website: https://river-zhang.github.io/ICEdit-gh-pages/
๐ GitHub Repository: https://github.com/River-Zhang/ICEdit/blob/main/scripts/gradio_demo.py
๐ค Huggingface:
sanaka87/ICEdit-MoE-LoRA
๐ arxiv Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)
๐ฅ Why itโs cool:
- Achieves high-quality, multi-task image editing.
- Uses only 1% of the training parameters and 0.1% of the training data compared to existing methods โ extremely efficient
- Beats several commercial models on background preservation, ID control, and consistency
- Open-source, low-cost, faster, and stronger โ think of it as the โDeepSeek of image editingโ ๐
We also implemented a Gradio demo app, available directly in our GitHub repo! And we made a flashy demo video โ happy to send it your way!
๐จ Daily Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)
๐ Code is now open source!
๐ฅ Huggingface DEMO:
RiverZ/ICEdit
๐ Project Website: https://river-zhang.github.io/ICEdit-gh-pages/
๐ GitHub Repository: https://github.com/River-Zhang/ICEdit/blob/main/scripts/gradio_demo.py
๐ค Huggingface:
sanaka87/ICEdit-MoE-LoRA
๐ arxiv Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)
๐ฅ Why itโs cool:
- Achieves high-quality, multi-task image editing.
- Uses only 1% of the training parameters and 0.1% of the training data compared to existing methods โ extremely efficient
- Beats several commercial models on background preservation, ID control, and consistency
- Open-source, low-cost, faster, and stronger โ think of it as the โDeepSeek of image editingโ ๐
We also implemented a Gradio demo app, available directly in our GitHub repo! And we made a flashy demo video โ happy to send it your way!

reacted to
abidlabs's
post with โค๏ธ
25 days ago
Post
4538
HOW TO ADD MCP SUPPORT TO ANY ๐ค SPACE
Gradio now supports MCP! If you want to convert an existing Space, like this one hexgrad/Kokoro-TTS, so that you can use it with Claude Desktop / Cursor / Cline / TinyAgents / or any LLM that supports MCP, here's all you need to do:
1. Duplicate the Space (in the Settings Tab)
2. Upgrade the Gradio
3. Set
4. (Optionally) add docstrings to the function so that the LLM knows how to use it, like this:
That's it! Now your LLM will be able to talk to you ๐คฏ
Gradio now supports MCP! If you want to convert an existing Space, like this one hexgrad/Kokoro-TTS, so that you can use it with Claude Desktop / Cursor / Cline / TinyAgents / or any LLM that supports MCP, here's all you need to do:
1. Duplicate the Space (in the Settings Tab)
2. Upgrade the Gradio
sdk_version
to 5.28
(in the README.md
)3. Set
mcp_server=True
in launch()
4. (Optionally) add docstrings to the function so that the LLM knows how to use it, like this:
def generate(text, speed=1):
"""
Convert text to speech audio.
Parameters:
text (str): The input text to be converted to speech.
speed (float, optional): Playback speed of the generated speech.
That's it! Now your LLM will be able to talk to you ๐คฏ

reacted to
ginipick's
post with ๐
25 days ago
Post
3186
๐จ Renoir Studio: Impressionist Masterpieces Reborn Through AI โจ
๐ Experience Renoir's Magical Brushstrokes with AI!
๐ Try it now: ginigen/flux-lora-renoir
๐ Model page: openfree/pierre-auguste-renoir
๐ Collection: openfree/painting-art-ai-681453484ec15ef5978bbeb1
Hello, AI art enthusiasts! ๐
Today I'm introducing a special model - Pierre-Auguste Renoir Studio. Create your own beautiful artwork in the style of the 19th century French Impressionist master! ๐ผ๏ธ
โจ Why Renoir's Style?
Renoir is famous for his luminous colors and soft brushstrokes. His works feature:
๐ Warm sunshine and dancing light
๐จโ๐ฉโ๐งโ๐ฆ The beauty of everyday life and joyful moments
๐ธ Vibrant nature and portraits of beautiful women
๐ญ Lively Parisian social gatherings and outdoor scenes
๐ฌ Technical Features
This model was developed as a flux-based learning model trained on a curated collection of high-resolution masterpieces from renowned global artists. The LoRA fine-tuning process leveraged exceptional quality open-access imagery released by prestigious institutions including the Art Institute of Chicago. The resulting model demonstrates remarkable capability in capturing the nuanced artistic techniques and stylistic elements across diverse historical art movements! ๐ง ๐ซ
๐ How to Use
Describe your desired scene in the prompt box
Add the "renoir" keyword at the end (this is the trigger keyword!)
Click the 'Generate' button
Enjoy your ideas reborn in Renoir's style!
๐ก Recommended Prompt Examples
"Elegant ladies enjoying a picnic in a sunlit garden, wearing pastel dresses and hats renoir"
"People boating by a riverbank, light reflecting on water, warmth of summer renoir"
"Paris cafe terrace, people chatting over coffee, evening sunset renoir"
๐ Now It's Your Turn!
#AI#Renoir #ArtificialIntelligence#HuggingFace #FLUX #LoRA
๐ Experience Renoir's Magical Brushstrokes with AI!
๐ Try it now: ginigen/flux-lora-renoir
๐ Model page: openfree/pierre-auguste-renoir
๐ Collection: openfree/painting-art-ai-681453484ec15ef5978bbeb1
Hello, AI art enthusiasts! ๐
Today I'm introducing a special model - Pierre-Auguste Renoir Studio. Create your own beautiful artwork in the style of the 19th century French Impressionist master! ๐ผ๏ธ
โจ Why Renoir's Style?
Renoir is famous for his luminous colors and soft brushstrokes. His works feature:
๐ Warm sunshine and dancing light
๐จโ๐ฉโ๐งโ๐ฆ The beauty of everyday life and joyful moments
๐ธ Vibrant nature and portraits of beautiful women
๐ญ Lively Parisian social gatherings and outdoor scenes
๐ฌ Technical Features
This model was developed as a flux-based learning model trained on a curated collection of high-resolution masterpieces from renowned global artists. The LoRA fine-tuning process leveraged exceptional quality open-access imagery released by prestigious institutions including the Art Institute of Chicago. The resulting model demonstrates remarkable capability in capturing the nuanced artistic techniques and stylistic elements across diverse historical art movements! ๐ง ๐ซ
๐ How to Use
Describe your desired scene in the prompt box
Add the "renoir" keyword at the end (this is the trigger keyword!)
Click the 'Generate' button
Enjoy your ideas reborn in Renoir's style!
๐ก Recommended Prompt Examples
"Elegant ladies enjoying a picnic in a sunlit garden, wearing pastel dresses and hats renoir"
"People boating by a riverbank, light reflecting on water, warmth of summer renoir"
"Paris cafe terrace, people chatting over coffee, evening sunset renoir"
๐ Now It's Your Turn!
#AI#Renoir #ArtificialIntelligence#HuggingFace #FLUX #LoRA

reacted to
sanaka87's
post with ๐ฅ
25 days ago
Post
2588
๐ Excited to Share Our Latest Work: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer๏ฝ
๐จ Daily Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)
๐ Code is now open source!
๐ฅ Huggingface DEMO: RiverZ/ICEdit
๐ Project Website: https://river-zhang.github.io/ICEdit-gh-pages/
๐ GitHub Repository: https://github.com/River-Zhang/ICEdit/blob/main/scripts/gradio_demo.py
๐ค Huggingface: sanaka87/ICEdit-MoE-LoRA
๐ arxiv Paper: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)
๐ฅ Why itโs cool:
- Achieves high-quality, multi-task image editing.
- Uses only 1% of the training parameters and 0.1% of the training data compared to existing methods โ extremely efficient
- Beats several commercial models on background preservation, ID control, and consistency
- Open-source, low-cost, faster, and stronger โ think of it as the โDeepSeek of image editingโ ๐
We also implemented a Gradio demo app, available directly in our GitHub repo! And we made a flashy demo video โ happy to send it your way!
๐จ Daily Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)
๐ Code is now open source!
๐ฅ Huggingface DEMO: RiverZ/ICEdit
๐ Project Website: https://river-zhang.github.io/ICEdit-gh-pages/
๐ GitHub Repository: https://github.com/River-Zhang/ICEdit/blob/main/scripts/gradio_demo.py
๐ค Huggingface: sanaka87/ICEdit-MoE-LoRA
๐ arxiv Paper: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)
๐ฅ Why itโs cool:
- Achieves high-quality, multi-task image editing.
- Uses only 1% of the training parameters and 0.1% of the training data compared to existing methods โ extremely efficient
- Beats several commercial models on background preservation, ID control, and consistency
- Open-source, low-cost, faster, and stronger โ think of it as the โDeepSeek of image editingโ ๐
We also implemented a Gradio demo app, available directly in our GitHub repo! And we made a flashy demo video โ happy to send it your way!

reacted to
jasoncorkill's
post with ๐
29 days ago
Post
5528
๐ Building Better Evaluations: 32K Image Annotations Now Available
Today, we're releasing an expanded version: 32K images annotated with 3.7M responses from over 300K individuals which was completed in under two weeks using the Rapidata Python API.
Rapidata/text-2-image-Rich-Human-Feedback-32k
A few months ago, we published one of our most liked dataset with 13K images based on the @data-is-better-together 's dataset, following Google's research on "Rich Human Feedback for Text-to-Image Generation" (https://arxiv.org/abs/2312.10240). It collected over 1.5M responses from 150K+ participants.
Rapidata/text-2-image-Rich-Human-Feedback
In the examples below, users highlighted words from prompts that were not correctly depicted in the generated images. Higher word scores indicate more frequent issues. If an image captured the prompt accurately, users could select [No_mistakes].
We're continuing to work on large-scale human feedback and model evaluation. If you're working on related research and need large, high-quality annotations, feel free to get in touch: [email protected].
Today, we're releasing an expanded version: 32K images annotated with 3.7M responses from over 300K individuals which was completed in under two weeks using the Rapidata Python API.
Rapidata/text-2-image-Rich-Human-Feedback-32k
A few months ago, we published one of our most liked dataset with 13K images based on the @data-is-better-together 's dataset, following Google's research on "Rich Human Feedback for Text-to-Image Generation" (https://arxiv.org/abs/2312.10240). It collected over 1.5M responses from 150K+ participants.
Rapidata/text-2-image-Rich-Human-Feedback
In the examples below, users highlighted words from prompts that were not correctly depicted in the generated images. Higher word scores indicate more frequent issues. If an image captured the prompt accurately, users could select [No_mistakes].
We're continuing to work on large-scale human feedback and model evaluation. If you're working on related research and need large, high-quality annotations, feel free to get in touch: [email protected].

reacted to
AdinaY's
post with ๐ฅ
29 days ago
Post
5124
Kimi-Audio ๐๐ง an OPEN audio foundation model released by Moonshot AI
moonshotai/Kimi-Audio-7B-Instruct
โจ 7B
โจ 13M+ hours of pretraining data
โจ Novel hybrid input architecture
โจ Universal audio capabilities (ASR, AQA, AAC, SER, SEC/ASC, end-to-end conversation)
moonshotai/Kimi-Audio-7B-Instruct
โจ 7B
โจ 13M+ hours of pretraining data
โจ Novel hybrid input architecture
โจ Universal audio capabilities (ASR, AQA, AAC, SER, SEC/ASC, end-to-end conversation)

reacted to
samihalawa's
post with ๐ฅ
about 1 month ago
Post
2422
SkyReels-V2 INFINITE VIDEO๐ฅโพ๏ธ๐ฌ UNLIMITED duration video generation model by Skywork.
> โFinally is here. An Open-Source model that achieves what we all have waiting for: Infinite Length Videos.โโ๐ฎ
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought (2504.05599)
Model: Skywork/SkyReels-V2-T2V-14B-720P
โจ 1.3B & 14B
โจ Generates infinite length videos using Diffusion Forcing with diffusion models + autoregressive methods
> โFinally is here. An Open-Source model that achieves what we all have waiting for: Infinite Length Videos.โโ๐ฎ
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought (2504.05599)
Model: Skywork/SkyReels-V2-T2V-14B-720P
โจ 1.3B & 14B
โจ Generates infinite length videos using Diffusion Forcing with diffusion models + autoregressive methods

reacted to
AdinaY's
post with ๐ฅ
about 1 month ago
Post
3509
MAGI-1 ๐ช the autoregressive diffusion video model, released by Sand AI
sand-ai/MAGI-1
โจ 24B with Apache 2.0
โจ Strong temporal consistency
โจ Benchmark-topping performance
sand-ai/MAGI-1
โจ 24B with Apache 2.0
โจ Strong temporal consistency
โจ Benchmark-topping performance