Linoy Tsaban

linoyts

AI & ML interests

None yet

Recent Activity

liked a model about 4 hours ago
tori29umai/FramePackI2V_HY_rotate_landscape
liked a Space about 4 hours ago
tori29umai/FramePack_rotate_landscape
liked a Space about 4 hours ago
ByteDance/ID-Patch-SDXL
View all activity

Organizations

Hugging Face's profile picture 🧨Diffusers's profile picture Hugging Face Internal Testing Organization's profile picture Huggingface Projects's profile picture Snap Research's profile picture Weizmann Institute of Science's profile picture Editing Images's profile picture leditsplusplus's profile picture Latent Consistency's profile picture Editing Audio's profile picture Women on Hugging Face's profile picture +RAIN film festival's profile picture diffusers-internal-dev's profile picture rnri-inversion's profile picture Snapchat Inc.'s profile picture Latent Explorers's profile picture open/ acc's profile picture RF Inversion's profile picture FlowEdit's profile picture CRINGE's profile picture Réflexion IA's profile picture IP Composer's profile picture Inference Endpoints Images's profile picture

linoyts's activity

reacted to RiverZ's post with 🔥 about 4 hours ago
view post
Post
1700
🚀 Excited to Share Our Latest Work: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer~

🎨 Daily Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)


🔓 Code is now open source!
🔥 Huggingface DEMO:
RiverZ/ICEdit

🌐 Project Website: https://river-zhang.github.io/ICEdit-gh-pages/
🏠 GitHub Repository: https://github.com/River-Zhang/ICEdit/blob/main/scripts/gradio_demo.py
🤗 Huggingface:
sanaka87/ICEdit-MoE-LoRA

📄 arxiv Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)


🔥 Why it’s cool:
- Achieves high-quality, multi-task image editing.
- Uses only 1% of the training parameters and 0.1% of the training data compared to existing methods — extremely efficient
- Beats several commercial models on background preservation, ID control, and consistency
- Open-source, low-cost, faster, and stronger — think of it as the “DeepSeek of image editing” 👀

We also implemented a Gradio demo app, available directly in our GitHub repo! And we made a flashy demo video — happy to send it your way!
reacted to abidlabs's post with ❤️ 3 days ago
view post
Post
3278
HOW TO ADD MCP SUPPORT TO ANY 🤗 SPACE

Gradio now supports MCP! If you want to convert an existing Space, like this one hexgrad/Kokoro-TTS, so that you can use it with Claude Desktop / Cursor / Cline / TinyAgents / or any LLM that supports MCP, here's all you need to do:

1. Duplicate the Space (in the Settings Tab)
2. Upgrade the Gradio sdk_version to 5.28 (in the README.md)
3. Set mcp_server=True in launch()
4. (Optionally) add docstrings to the function so that the LLM knows how to use it, like this:

def generate(text, speed=1):
    """
    Convert text to speech audio.

    Parameters:
        text (str): The input text to be converted to speech.
        speed (float, optional): Playback speed of the generated speech.


That's it! Now your LLM will be able to talk to you 🤯
reacted to ginipick's post with 👍 3 days ago
view post
Post
2934
🎨 Renoir Studio: Impressionist Masterpieces Reborn Through AI ✨

🌟 Experience Renoir's Magical Brushstrokes with AI!

🔗 Try it now: ginigen/flux-lora-renoir
🔗 Model page: openfree/pierre-auguste-renoir
🔗 Collection: openfree/painting-art-ai-681453484ec15ef5978bbeb1

Hello, AI art enthusiasts! 💖
Today I'm introducing a special model - Pierre-Auguste Renoir Studio. Create your own beautiful artwork in the style of the 19th century French Impressionist master! 🖼️
✨ Why Renoir's Style?
Renoir is famous for his luminous colors and soft brushstrokes. His works feature:

🌞 Warm sunshine and dancing light
👨‍👩‍👧‍👦 The beauty of everyday life and joyful moments
🌸 Vibrant nature and portraits of beautiful women
🎭 Lively Parisian social gatherings and outdoor scenes

🔬 Technical Features
This model was developed as a flux-based learning model trained on a curated collection of high-resolution masterpieces from renowned global artists. The LoRA fine-tuning process leveraged exceptional quality open-access imagery released by prestigious institutions including the Art Institute of Chicago. The resulting model demonstrates remarkable capability in capturing the nuanced artistic techniques and stylistic elements across diverse historical art movements! 🧠💫
🚀 How to Use

Describe your desired scene in the prompt box
Add the "renoir" keyword at the end (this is the trigger keyword!)
Click the 'Generate' button
Enjoy your ideas reborn in Renoir's style!

💡 Recommended Prompt Examples

"Elegant ladies enjoying a picnic in a sunlit garden, wearing pastel dresses and hats renoir"
"People boating by a riverbank, light reflecting on water, warmth of summer renoir"
"Paris cafe terrace, people chatting over coffee, evening sunset renoir"

🌈 Now It's Your Turn!
#AI#Renoir #ArtificialIntelligence#HuggingFace #FLUX #LoRA
reacted to sanaka87's post with 🔥 3 days ago
view post
Post
2291
🚀 Excited to Share Our Latest Work: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer~

🎨 Daily Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)

🔓 Code is now open source!
🔥 Huggingface DEMO: RiverZ/ICEdit
🌐 Project Website: https://river-zhang.github.io/ICEdit-gh-pages/
🏠 GitHub Repository: https://github.com/River-Zhang/ICEdit/blob/main/scripts/gradio_demo.py
🤗 Huggingface: sanaka87/ICEdit-MoE-LoRA
📄 arxiv Paper: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)

🔥 Why it’s cool:
- Achieves high-quality, multi-task image editing.
- Uses only 1% of the training parameters and 0.1% of the training data compared to existing methods — extremely efficient
- Beats several commercial models on background preservation, ID control, and consistency
- Open-source, low-cost, faster, and stronger — think of it as the “DeepSeek of image editing” 👀

We also implemented a Gradio demo app, available directly in our GitHub repo! And we made a flashy demo video — happy to send it your way!
  • 1 reply
·
reacted to jasoncorkill's post with 🚀 7 days ago
view post
Post
5469
🚀 Building Better Evaluations: 32K Image Annotations Now Available

Today, we're releasing an expanded version: 32K images annotated with 3.7M responses from over 300K individuals which was completed in under two weeks using the Rapidata Python API.

Rapidata/text-2-image-Rich-Human-Feedback-32k

A few months ago, we published one of our most liked dataset with 13K images based on the @data-is-better-together 's dataset, following Google's research on "Rich Human Feedback for Text-to-Image Generation" (https://arxiv.org/abs/2312.10240). It collected over 1.5M responses from 150K+ participants.

Rapidata/text-2-image-Rich-Human-Feedback

In the examples below, users highlighted words from prompts that were not correctly depicted in the generated images. Higher word scores indicate more frequent issues. If an image captured the prompt accurately, users could select [No_mistakes].

We're continuing to work on large-scale human feedback and model evaluation. If you're working on related research and need large, high-quality annotations, feel free to get in touch: [email protected].
reacted to AdinaY's post with 🔥 7 days ago
view post
Post
5026
Kimi-Audio 🚀🎧 an OPEN audio foundation model released by Moonshot AI
moonshotai/Kimi-Audio-7B-Instruct
✨ 7B
✨ 13M+ hours of pretraining data
✨ Novel hybrid input architecture
✨ Universal audio capabilities (ASR, AQA, AAC, SER, SEC/ASC, end-to-end conversation)
reacted to samihalawa's post with 🔥 11 days ago
view post
Post
2409
SkyReels-V2 INFINITE VIDEO🔥♾️🎬 UNLIMITED duration video generation model by Skywork.

> “Finally is here. An Open-Source model that achieves what we all have waiting for: Infinite Length Videos.’’😮

Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought (2504.05599)

Model: Skywork/SkyReels-V2-T2V-14B-720P

✨ 1.3B & 14B
✨ Generates infinite length videos using Diffusion Forcing with diffusion models + autoregressive methods
reacted to victor's post with 👍 11 days ago
view post
Post
2968
DIA TTS is just amazing - please share your funniest gens (here is mine) 😂
nari-labs/Dia-1.6B
reacted to AdinaY's post with 🔥 12 days ago
view post
Post
3460
MAGI-1 🪄 the autoregressive diffusion video model, released by Sand AI

sand-ai/MAGI-1

✨ 24B with Apache 2.0
✨ Strong temporal consistency
✨ Benchmark-topping performance
  • 1 reply
·
posted an update 13 days ago
reacted to fdaudens's post with 🤯 25 days ago
view post
Post
4104
🎨 Designers, meet OmniSVG! This new model helps you create professional vector graphics from text/images, generate editable SVGs from icons to detailed characters, convert rasters to vectors, maintain style consistency with references, and integrate into your workflow.

@OmniSVG
  • 2 replies
·
reacted to ajibawa-2023's post with 🔥 25 days ago
view post
Post
3961
Hi All, I recently released two Audio datasets which are generated using my earlier released dataset: ajibawa-2023/Children-Stories-Collection

First Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection-Large has 5600++ stories in .mp3 format.

Second Audio Dataset:https://huggingface.co/datasets/ajibawa-2023/Audio-Children-Stories-Collection has 600 stories in .mp3 format.
·
reacted to AdinaY's post with 🔥 about 1 month ago
reacted to seawolf2357's post with 🔥 about 1 month ago
view post
Post
8212
🎨 Ghibli-Style Image Generation with Multilingual Text Integration: FLUX.1 Hugging Face Edition 🌏✨

Hello creators! Today I'm introducing a special image generator that combines the beautiful aesthetics of Studio Ghibli with multilingual text integration! 😍

seawolf2357/Ghibli-Multilingual-Text-rendering

✨ Key Features

Ghibli-Style Image Generation - High-quality animation-style images based on FLUX.1
Multilingual Text Rendering - Support for Korean, Japanese, English, and all languages! 🇰🇷🇯🇵🇬🇧
Automatic Image Editing with Simple Prompts - Just input your desired text and you're done!
Two Stylistic Variations Provided - Get two different results from a single prompt
Full Hugging Face Spaces Support - Deploy and share instantly!

🚀 How Does It Work?

Enter a prompt describing your desired image (e.g., "a cat sitting by the window")
Input the text you want to add (any language works!)
Select the text position, size, and color
Two different versions are automatically generated!

💯 Advantages of This Model

No Tedious Post-Editing Needed - Text is perfectly integrated during generation
Natural Text Integration - Text automatically adjusts to match the image style
Perfect Multilingual Support - Any language renders beautifully!
User-Friendly Interface - Easily adjust text size, position, and color
One-Click Hugging Face Deployment - Use immediately without complex setup

🎭 Use Cases

Creating multilingual greeting cards
Animation-style social media content
Ghibli-inspired posters or banners
Character images with dialogue in various languages
Sharing with the community through Hugging Face Spaces

This project leverages Hugging Face's FLUX.1 model to open new possibilities for seamlessly integrating high-quality Ghibli-style images with multilingual text using just prompts! 🌈
Try it now and create your own artistic masterpieces! 🎨✨

#GhibliStyle #MultilingualSupport #AIImageGeneration #TextRendering #FLUX #HuggingFace
·
reacted to ZhiyuanthePony's post with 🤗 about 1 month ago
view post
Post
2586
🎉 Thrilled to share our #CVPR2025 accepted work:
Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data (2503.21694)

🔥 ​Key Innovations:
1️⃣ First to adapt SD for ​direct textured mesh generation (1-2s inference)
2️⃣ Novel teacher-student framework leveraging multi-view diffusion models ([MVDream](https://arxiv.org/abs/2308.16512) & [RichDreamer](https://arxiv.org/abs/2311.16918))
3️⃣ ​Parameter-efficient tuning - ​only +2.6% params over base SD
4️⃣ ​3D data-free training liberates model from dataset constraints

💡 Why matters?
→ A novel ​3D-Data-Free paradigm
→ Outperforms data-driven methods on creative concept generation
→ Unlocks web-scale text corpus for 3D content creation

🌐 Project: https://theericma.github.io/TriplaneTurbo/
🎮 Demo: ZhiyuanthePony/TriplaneTurbo
💻 Code: https://github.com/theEricMa/TriplaneTurbo
reacted to prithivMLmods's post with 👍 about 1 month ago
view post
Post
2632
Dropping Downstream tasks using newly initialized parameters and weights ([classifier.bias & weights]) support domain-specific 𝗶𝗺𝗮𝗴𝗲 𝗰𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻. Based on siglip2-base-patch16-224 and DomainNet (single-domain, multi-source adaptation), with Fashion-MNIST & More for experimental testing. 🧤☄️

Fashion-Mnist : prithivMLmods/Fashion-Mnist-SigLIP2
Mnist-Digits : prithivMLmods/Mnist-Digits-SigLIP2
Multisource-121 : prithivMLmods/Multisource-121-DomainNet
Painting-126 : prithivMLmods/Painting-126-DomainNet
Sketch-126 : prithivMLmods/Sketch-126-DomainNet
Clipart-126 : prithivMLmods/Clipart-126-DomainNet

Models are trained with different parameter settings for experimental purposes only, with the intent of further development. Refer to the model page below for instructions on running it with Transformers 🤗.

Collection : prithivMLmods/domainnet-0324-67e0e3c934c03cc40c6c8782

Citations : SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features https://arxiv.org/pdf/2502.14786 & Moment Matching for Multi-Source Domain Adaptation : https://arxiv.org/pdf/1812.01754

reacted to Yehor's post with 👍 2 months ago
view post
Post
2878
Published a stable version of Ukrainian Text-to-Speech library on GitHub and PyPI.

Features:

- Multi-speaker model: 2 female (Tetiana, Lada) + 1 male (Mykyta) voices;
- Fine-grained control over speech parameters, including duration, fundamental frequency (F0), and energy;
- High-fidelity speech generation using the RAD-TTS++ acoustic model;
- Fast vocoding using Vocos;
- Synthesizes long sentences effectively;
- Supports a sampling rate of 44.1 kHz;
- Tested on Linux environments and Windows/WSL;
- Python API (requires Python 3.9 or later);
- CUDA-enabled for GPU acceleration.

Repository: https://github.com/egorsmkv/tts_uk
reacted to freddyaboulton's post with 🚀 2 months ago
view post
Post
3269
Getting WebRTC and Websockets right in python is very tricky. If you've tried to wrap an LLM in a real-time audio layer then you know what I'm talking about.

That's where FastRTC comes in! It makes WebRTC and Websocket streams super easy with minimal code and overhead.

Check out our org: hf.co/fastrtc
reacted to burtenshaw's post with 🔥 2 months ago
view post
Post
6434
Now the Hugging Face agent course is getting real! With frameworks like smolagents, LlamaIndex, and LangChain.

🔗 Follow the org for updates agents-course

This week we are releasing the first framework unit in the course and it’s on smolagents. This is what the unit covers:

- why should you use smolagents vs another library?
- how to build agents that use code
- build multiagents systems
- use vision language models for browser use

The team has been working flat out on this for a few weeks. Led by @sergiopaniego and supported by smolagents author @m-ric .
reacted to AdinaY's post with ❤️ 2 months ago
view post
Post
4241
🚀 StepFun阶跃星辰 is making BIG open moves!

Last year, their GOT-OCR 2.0 took the community by storm 🔥but many didn’t know they were also building some amazing models. Now, they’ve just dropped something huge on the hub!

📺 Step-Video-T2V: a 30B bilingual open video model that generates 204 frames (8-10s) at 540P resolution with high information density & consistency.
stepfun-ai/stepvideo-t2v

🔊 Step-Audio-TTS-3B : a TTS trained with the LLM-Chat paradigm on a large synthetic dataset, capable of generating RAP & Humming
stepfun-ai/step-audio-67b33accf45735bb21131b0b
·