Linoy Tsaban PRO

linoyts

AI & ML interests

None yet

Recent Activity

Organizations

Hugging Face's profile picture ๐ŸงจDiffusers's profile picture Hugging Face Internal Testing Organization's profile picture Huggingface Projects's profile picture Snap Research's profile picture Weizmann Institute of Science's profile picture Editing Images's profile picture leditsplusplus's profile picture Latent Consistency's profile picture Editing Audio's profile picture Women on Hugging Face's profile picture +RAIN film festival's profile picture diffusers-internal-dev's profile picture rnri-inversion's profile picture Snapchat Inc.'s profile picture Latent Explorers's profile picture open/ acc's profile picture RF Inversion's profile picture FlowEdit's profile picture LTX Collaborations's profile picture CRINGE's profile picture Rรฉflexion IA's profile picture IP Composer's profile picture Transformers Community's profile picture Inference Endpoints Images's profile picture Hugging Face MCP Course's profile picture

linoyts's activity

reacted to clem's post with ๐Ÿ”ฅ 1 day ago
view post
Post
2957
Playing with Veo3 this morning. Share your prompt if you want me to create videos for you (bonus point if they funnily reference HF/open-source). These videos are "a cat on the moon rapping "I love Hugging Face""!
ยท
reacted to sayakpaul's post with ๐Ÿค— 5 days ago
view post
Post
2204
Diffusers supports a good variety of quantization backends. It can be challenging to navigate through them, given the complex nature of diffusion pipelines in general.

So, @derekl35 set out to write a comprehensive guide that puts users in the front seat. Explore the different backends we support, learn the trade-offs they offer, and finally, check out the cool space we built that lets you compare quantization results.

Give it a go here:
https://lnkd.in/gf8Pi4-2
reacted to sayakpaul's post with ๐Ÿ”ฅ 6 days ago
view post
Post
1621
Despite the emergence of combining LLM and DiT architectures for T2I synthesis, its design remains severely understudied.

This was done long ago and got into CVPR25 -- super excited to finally share it now, along with the data and code โ™ฅ๏ธ

We explore several architectural choices that affect this design. We provide an open & reproducible training recipe that works at scale.

Works like Playground v3 have already explored a deep fusion between an LLM and a DiT, sharing their representations through layerwise attention. They exhibit excellent performance on T2I.

Despite its compelling results and other performance virtues, it remains unexplored, which is what we want to improve in our work. Specifically, we take a pre-trained LLM (Gemma-2B) and trainable DiT, and set out to explore what makes a "good deep fusion" between the two for T2I.

We explore several key questions in the work, such as:

Q1: How should we do attention? We considered several alternatives. PixArt-Alpha like attention (cross-attention) is very promising.
Q2: Should we incorporate additional text modulation?
Q3: Can we eliminate timestep conditioning?
Q4: How do we do positional encodings?
Q5: Do instruction-tuned LLMs help deep fusion?
Q6: Would using a decoder LLM from a multimodal model be helpful?
Q7: Does using a better variant of Gemma help?

Based on the above findings, we arrive at FuseDiT with the following components on top of the base architecture from the findings of our experiments.

* No AdaLN-Zero modules
* 1D + 2D-RoPE
* Gemma 2 2B, adjusting DiT configurations accordingly

We trained FuseDiT on a mixture from CC12M, JourneyDB, & SA (~26M image-text pairs) for 800 steps. While not the best model, it's encouraging to develop something in a guided manner using open datasets.

To know more (code, models, all are available), please check out the paper:
https://lnkd.in/gg6qyqZX.
reacted to AdinaY's post with ๐Ÿš€ 6 days ago
view post
Post
2692
ByteDance is absolutely cooking lately๐Ÿ”ฅ

BAGEL ๐Ÿฅฏ 7B active parameter open multimodal foundation model by Bytedance Seed team.

ByteDance-Seed/BAGEL-7B-MoT

โœจ Apache 2.0
โœจ Outperforms top VLMs (Qwen2.5-VL & InternVL-2.5)
โœจ Mixture-of-Transformer-Experts + dual encoders
โœจ Trained on trillions of interleaved tokens
reacted to loubnabnl's post with โค๏ธ 9 days ago
reacted to AdinaY's post with ๐Ÿš€ 14 days ago
view post
Post
2504
Matrix Game ๐ŸŽฎ an interactive foundation model for controllable game world generation, released by Skywork AI.

Skywork/Matrix-Game

โœจ 17B with MIT licensed
โœจ Diffusion-based image-to-world video generation via keyboard & mouse input
โœจ GameWorld Score benchmark for Minecraft world models
โœจ Massive Matrix Game Dataset with fine-grained action labels
reacted to merve's post with ๐Ÿ”ฅ 15 days ago
view post
Post
5008
VLMS 2025 UPDATE ๐Ÿ”ฅ

We just shipped a blog on everything latest on vision language models, including
๐Ÿค– GUI agents, agentic VLMs, omni models
๐Ÿ“‘ multimodal RAG
โฏ๏ธ video LMs
๐Ÿค๐Ÿป smol models
..and more! https://huggingface.co/blog/vlms-2025
  • 1 reply
ยท
reacted to AdinaY's post with ๐Ÿ˜Ž 20 days ago
view post
Post
3926
ACE-Step ๐ŸŽต a music generation foundation model released by
StepFun & ACEStudio

Model: ACE-Step/ACE-Step-v1-3.5B
Demo: ACE-Step/ACE-Step

โœจ 3.5B, Apache2.0 licensed
โœจ 115ร— faster than LLMs (4-min music in 20s on A100)
โœจ Diffusion + DCAE + linear transformer = speed + coherence
โœจ Supports voice cloning, remixing, lyric editing & more
  • 1 reply
ยท
reacted to RiverZ's post with ๐Ÿค— 21 days ago
view post
Post
6408
๐Ÿ”ฅ We're thrilled to share some exciting news about ICEdit! Currently, ICEdit app ( RiverZ/ICEdit) has soared to the second place on the weekly trend list of Hugging Face Space, just trailing behind Qwen3. What's more, it also holds the second position on the overall space trend list. This achievement wouldn't have been possible without your incredible support and love. A huge thank you to each and every one of youโค!

๐ŸŽ‰ The ICEdit community has been incredibly active, and we've seen a plethora of amazing ComfyUI workflows being shared. For instance, with the help of ComfyUI - nunchaku, you can run ICEdit locally with just 4GB of VRAM. This makes it much more accessible for those with limited hardware resources.

๐ŸŽ‡ If you're interested in the detailed information, please head over to our repository. We highly encourage you to give these workflows a try and explore the creative possibilities that ICEdit offers.

Github Repo: https://github.com/River-Zhang/ICEdit
Hugging Face Space: RiverZ/ICEdit
reacted to nyuuzyou's post with ๐Ÿ”ฅ 21 days ago
view post
Post
3618
๐Ÿ–ผ๏ธ PublicDomainFiles.com Collection - nyuuzyou/publicdomainfiles

Collection of 206,204 Public Domain multimedia files featuring:

- Comprehensive metadata: title, description, creator name, keywords, original page URL, and more.
- Contains various media types including images, clip art, artwork, fonts, videos, and TV shows.
- All content explicitly released into the public domain under the CC0 license.
- Organized in a single train split with 206,204 entries.
posted an update 22 days ago
view post
Post
3014
FramePack is hands down one of the best OS releases in video generation ๐Ÿ™‡๐Ÿปโ€โ™€๏ธ๐Ÿคฏ
โœ… fully open sourced + amazing quality + reduced memory + improved speed
but more even - its gonna facilitate *soooo* many downstream applications
like this version adapted for landscape rotation ๐Ÿ‘‡https://huggingface.co/spaces/tori29umai/FramePack_rotate_landscape
  • 2 replies
ยท
reacted to RiverZ's post with ๐Ÿ”ฅ 22 days ago
view post
Post
3044
๐Ÿš€ Excited to Share Our Latest Work: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer๏ฝž

๐ŸŽจ Daily Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)


๐Ÿ”“ Code is now open source!
๐Ÿ”ฅ Huggingface DEMO:
RiverZ/ICEdit

๐ŸŒ Project Website: https://river-zhang.github.io/ICEdit-gh-pages/
๐Ÿ  GitHub Repository: https://github.com/River-Zhang/ICEdit/blob/main/scripts/gradio_demo.py
๐Ÿค— Huggingface:
sanaka87/ICEdit-MoE-LoRA

๐Ÿ“„ arxiv Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)


๐Ÿ”ฅ Why itโ€™s cool:
- Achieves high-quality, multi-task image editing.
- Uses only 1% of the training parameters and 0.1% of the training data compared to existing methods โ€” extremely efficient
- Beats several commercial models on background preservation, ID control, and consistency
- Open-source, low-cost, faster, and stronger โ€” think of it as the โ€œDeepSeek of image editingโ€ ๐Ÿ‘€

We also implemented a Gradio demo app, available directly in our GitHub repo! And we made a flashy demo video โ€” happy to send it your way!
reacted to abidlabs's post with โค๏ธ 25 days ago
view post
Post
4538
HOW TO ADD MCP SUPPORT TO ANY ๐Ÿค— SPACE

Gradio now supports MCP! If you want to convert an existing Space, like this one hexgrad/Kokoro-TTS, so that you can use it with Claude Desktop / Cursor / Cline / TinyAgents / or any LLM that supports MCP, here's all you need to do:

1. Duplicate the Space (in the Settings Tab)
2. Upgrade the Gradio sdk_version to 5.28 (in the README.md)
3. Set mcp_server=True in launch()
4. (Optionally) add docstrings to the function so that the LLM knows how to use it, like this:

def generate(text, speed=1):
    """
    Convert text to speech audio.

    Parameters:
        text (str): The input text to be converted to speech.
        speed (float, optional): Playback speed of the generated speech.


That's it! Now your LLM will be able to talk to you ๐Ÿคฏ
reacted to ginipick's post with ๐Ÿ‘ 25 days ago
view post
Post
3186
๐ŸŽจ Renoir Studio: Impressionist Masterpieces Reborn Through AI โœจ

๐ŸŒŸ Experience Renoir's Magical Brushstrokes with AI!

๐Ÿ”— Try it now: ginigen/flux-lora-renoir
๐Ÿ”— Model page: openfree/pierre-auguste-renoir
๐Ÿ”— Collection: openfree/painting-art-ai-681453484ec15ef5978bbeb1

Hello, AI art enthusiasts! ๐Ÿ’–
Today I'm introducing a special model - Pierre-Auguste Renoir Studio. Create your own beautiful artwork in the style of the 19th century French Impressionist master! ๐Ÿ–ผ๏ธ
โœจ Why Renoir's Style?
Renoir is famous for his luminous colors and soft brushstrokes. His works feature:

๐ŸŒž Warm sunshine and dancing light
๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ The beauty of everyday life and joyful moments
๐ŸŒธ Vibrant nature and portraits of beautiful women
๐ŸŽญ Lively Parisian social gatherings and outdoor scenes

๐Ÿ”ฌ Technical Features
This model was developed as a flux-based learning model trained on a curated collection of high-resolution masterpieces from renowned global artists. The LoRA fine-tuning process leveraged exceptional quality open-access imagery released by prestigious institutions including the Art Institute of Chicago. The resulting model demonstrates remarkable capability in capturing the nuanced artistic techniques and stylistic elements across diverse historical art movements! ๐Ÿง ๐Ÿ’ซ
๐Ÿš€ How to Use

Describe your desired scene in the prompt box
Add the "renoir" keyword at the end (this is the trigger keyword!)
Click the 'Generate' button
Enjoy your ideas reborn in Renoir's style!

๐Ÿ’ก Recommended Prompt Examples

"Elegant ladies enjoying a picnic in a sunlit garden, wearing pastel dresses and hats renoir"
"People boating by a riverbank, light reflecting on water, warmth of summer renoir"
"Paris cafe terrace, people chatting over coffee, evening sunset renoir"

๐ŸŒˆ Now It's Your Turn!
#AI#Renoir #ArtificialIntelligence#HuggingFace #FLUX #LoRA
reacted to sanaka87's post with ๐Ÿ”ฅ 25 days ago
view post
Post
2588
๐Ÿš€ Excited to Share Our Latest Work: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer๏ฝž

๐ŸŽจ Daily Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)

๐Ÿ”“ Code is now open source!
๐Ÿ”ฅ Huggingface DEMO: RiverZ/ICEdit
๐ŸŒ Project Website: https://river-zhang.github.io/ICEdit-gh-pages/
๐Ÿ  GitHub Repository: https://github.com/River-Zhang/ICEdit/blob/main/scripts/gradio_demo.py
๐Ÿค— Huggingface: sanaka87/ICEdit-MoE-LoRA
๐Ÿ“„ arxiv Paper: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)

๐Ÿ”ฅ Why itโ€™s cool:
- Achieves high-quality, multi-task image editing.
- Uses only 1% of the training parameters and 0.1% of the training data compared to existing methods โ€” extremely efficient
- Beats several commercial models on background preservation, ID control, and consistency
- Open-source, low-cost, faster, and stronger โ€” think of it as the โ€œDeepSeek of image editingโ€ ๐Ÿ‘€

We also implemented a Gradio demo app, available directly in our GitHub repo! And we made a flashy demo video โ€” happy to send it your way!
  • 1 reply
ยท
reacted to jasoncorkill's post with ๐Ÿš€ 29 days ago
view post
Post
5528
๐Ÿš€ Building Better Evaluations: 32K Image Annotations Now Available

Today, we're releasing an expanded version: 32K images annotated with 3.7M responses from over 300K individuals which was completed in under two weeks using the Rapidata Python API.

Rapidata/text-2-image-Rich-Human-Feedback-32k

A few months ago, we published one of our most liked dataset with 13K images based on the @data-is-better-together 's dataset, following Google's research on "Rich Human Feedback for Text-to-Image Generation" (https://arxiv.org/abs/2312.10240). It collected over 1.5M responses from 150K+ participants.

Rapidata/text-2-image-Rich-Human-Feedback

In the examples below, users highlighted words from prompts that were not correctly depicted in the generated images. Higher word scores indicate more frequent issues. If an image captured the prompt accurately, users could select [No_mistakes].

We're continuing to work on large-scale human feedback and model evaluation. If you're working on related research and need large, high-quality annotations, feel free to get in touch: [email protected].
reacted to AdinaY's post with ๐Ÿ”ฅ 29 days ago
view post
Post
5124
Kimi-Audio ๐Ÿš€๐ŸŽง an OPEN audio foundation model released by Moonshot AI
moonshotai/Kimi-Audio-7B-Instruct
โœจ 7B
โœจ 13M+ hours of pretraining data
โœจ Novel hybrid input architecture
โœจ Universal audio capabilities (ASR, AQA, AAC, SER, SEC/ASC, end-to-end conversation)
reacted to samihalawa's post with ๐Ÿ”ฅ about 1 month ago
view post
Post
2422
SkyReels-V2 INFINITE VIDEO๐Ÿ”ฅโ™พ๏ธ๐ŸŽฌ UNLIMITED duration video generation model by Skywork.

> โ€œFinally is here. An Open-Source model that achieves what we all have waiting for: Infinite Length Videos.โ€™โ€™๐Ÿ˜ฎ

Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought (2504.05599)

Model: Skywork/SkyReels-V2-T2V-14B-720P

โœจ 1.3B & 14B
โœจ Generates infinite length videos using Diffusion Forcing with diffusion models + autoregressive methods
reacted to victor's post with ๐Ÿ‘ about 1 month ago
view post
Post
4177
DIA TTS is just amazing - please share your funniest gens (here is mine) ๐Ÿ˜‚
nari-labs/Dia-1.6B
reacted to AdinaY's post with ๐Ÿ”ฅ about 1 month ago
view post
Post
3509
MAGI-1 ๐Ÿช„ the autoregressive diffusion video model, released by Sand AI

sand-ai/MAGI-1

โœจ 24B with Apache 2.0
โœจ Strong temporal consistency
โœจ Benchmark-topping performance
  • 1 reply
ยท