LI
RogerZhuo
AI & ML interests
None yet
Recent Activity
liked
a model
6 days ago
fofr/kontext-make-person-real
liked
a model
14 days ago
HiDream-ai/HiDream-E1-1
liked
a model
19 days ago
moonshotai/Kimi-K2-Instruct
Organizations
Reading
Music
-
ElectricAlexis/NotaGen
Updated • 148 -
ASLP-lab/LLaSE-G1
Audio-to-Audio • Updated • 24 -
Running on Zero616616
Di♪♪Rhythm
🎶Blazingly Fast and Embarrassingly Simple Song Generation
-
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
Paper • 2503.01183 • Published • 28
AI Arena
I2V
image-to-video
-
Wan-AI/Wan2.1-T2V-1.3B
Text-to-Video • Updated • 16.9k • • 366 -
VBench: Comprehensive Benchmark Suite for Video Generative Models
Paper • 2311.17982 • Published • 9 -
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models
Paper • 2411.13503 • Published • 35 -
tencent/HunyuanVideo-I2V
Image-to-Video • Updated • 226 • • 321
LLM
基础大模型相关
must-read-papers
AI Papers
-
Reinforcement Learning: An Overview
Paper • 2412.05265 • Published • 8 -
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Paper • 2411.01156 • Published • 7 -
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Paper • 2503.21755 • Published • 34 -
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 165
OCR
images
images
-
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 1.54M • • 11k -
cagliostrolab/animagine-xl-4.0
Text-to-Image • Updated • 37.2k • • 308 -
Runtime error277277
Thera Arbitrary-Scale Super-Resolution
🔥Enhance image quality by scaling it up
-
stepfun-ai/Step1X-Edit
Image-to-Image • Updated • 270 • • 310
TTS
语音相关
-
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Paper • 2307.16430 • Published • 4 -
Zyphra/Zonos-v0.1-transformer
Text-to-Speech • Updated • 51.8k • • 410 -
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Paper • 2502.05512 • Published • 2 -
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
Paper • 2502.11946 • Published • 3
virtual try-on
虚拟换妆
-
Learning Flow Fields in Attention for Controllable Person Image Generation
Paper • 2412.08486 • Published • 37 -
franciszzj/Leffa
Image-to-Image • Updated • 332 -
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models
Paper • 2411.18350 • Published • 30 -
Running on Zero5555
TryOffDiff
🔥Extract garment images from everyday images!
Data
must-read-papers
Reading
AI Papers
-
Reinforcement Learning: An Overview
Paper • 2412.05265 • Published • 8 -
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Paper • 2411.01156 • Published • 7 -
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Paper • 2503.21755 • Published • 34 -
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 165
Music
-
ElectricAlexis/NotaGen
Updated • 148 -
ASLP-lab/LLaSE-G1
Audio-to-Audio • Updated • 24 -
Running on Zero616616
Di♪♪Rhythm
🎶Blazingly Fast and Embarrassingly Simple Song Generation
-
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
Paper • 2503.01183 • Published • 28
OCR
AI Arena
images
images
-
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 1.54M • • 11k -
cagliostrolab/animagine-xl-4.0
Text-to-Image • Updated • 37.2k • • 308 -
Runtime error277277
Thera Arbitrary-Scale Super-Resolution
🔥Enhance image quality by scaling it up
-
stepfun-ai/Step1X-Edit
Image-to-Image • Updated • 270 • • 310
I2V
image-to-video
-
Wan-AI/Wan2.1-T2V-1.3B
Text-to-Video • Updated • 16.9k • • 366 -
VBench: Comprehensive Benchmark Suite for Video Generative Models
Paper • 2311.17982 • Published • 9 -
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models
Paper • 2411.13503 • Published • 35 -
tencent/HunyuanVideo-I2V
Image-to-Video • Updated • 226 • • 321
TTS
语音相关
-
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Paper • 2307.16430 • Published • 4 -
Zyphra/Zonos-v0.1-transformer
Text-to-Speech • Updated • 51.8k • • 410 -
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Paper • 2502.05512 • Published • 2 -
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
Paper • 2502.11946 • Published • 3
LLM
基础大模型相关
virtual try-on
虚拟换妆
-
Learning Flow Fields in Attention for Controllable Person Image Generation
Paper • 2412.08486 • Published • 37 -
franciszzj/Leffa
Image-to-Image • Updated • 332 -
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models
Paper • 2411.18350 • Published • 30 -
Running on Zero5555
TryOffDiff
🔥Extract garment images from everyday images!