Han-Bit Kang
hbkang
AI & ML interests
ML
Recent Activity
liked
a model
2 days ago
MizzenAI/HPSv3
upvoted
a
paper
2 days ago
HPSv3: Towards Wide-Spectrum Human Preference Score
updated
a collection
22 days ago
talking-head-generation
Organizations
None yet
interesting architecture
-
FAN: Fourier Analysis Networks
Paper • 2410.02675 • Published • 28 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 89 -
Scalable-Softmax Is Superior for Attention
Paper • 2501.19399 • Published • 22 -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 7
talking-head-generation
-
DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation
Paper • 2312.13578 • Published • 29 -
Splatter Image: Ultra-Fast Single-View 3D Reconstruction
Paper • 2312.13150 • Published • 16 -
Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians
Paper • 2312.03029 • Published • 26 -
Relightable Gaussian Codec Avatars
Paper • 2312.03704 • Published • 33
full-body-generation
-
URHand: Universal Relightable Hands
Paper • 2401.05334 • Published • 25 -
Synthesizing Moving People with 3D Control
Paper • 2401.10889 • Published • 12 -
GauHuman: Articulated Gaussian Splatting from Monocular Human Videos
Paper • 2312.02973 • Published • 1 -
ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering
Paper • 2312.05941 • Published • 1
cool-papers
-
Depth Anything V2
Paper • 2406.09414 • Published • 104 -
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 52 -
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Paper • 2406.04338 • Published • 40 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 116
ID-Preserving Generation
-
ID-Animator: Zero-Shot Identity-Preserving Human Video Generation
Paper • 2404.15275 • Published -
IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models
Paper • 2403.13535 • Published • 24 -
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models
Paper • 2309.05793 • Published • 50 -
GHOST 2.0: generative high-fidelity one shot transfer of heads
Paper • 2502.18417 • Published • 67
generative-model-training
-
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Paper • 2310.00426 • Published • 60 -
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation
Paper • 2310.16656 • Published • 47 -
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images
Paper • 2310.16825 • Published • 36 -
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers
Paper • 2401.11605 • Published • 23
artistic rendering
-
Boundary Attention: Learning to Find Faint Boundaries at Any Resolution
Paper • 2401.00935 • Published • 18 -
Derendering/InkSight-Small-p
Updated • 72 • 32 -
E^{2}GAN: Efficient Training of Efficient GANs for Image-to-Image Translation
Paper • 2401.06127 • Published -
Acoustic Volume Rendering for Neural Impulse Response Fields
Paper • 2411.06307 • Published • 5
text-to-media
-
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper • 2401.04577 • Published • 44 -
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 37 -
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
Paper • 2402.05054 • Published • 29
Makeup Transfer
ID-Preserving Generation
-
ID-Animator: Zero-Shot Identity-Preserving Human Video Generation
Paper • 2404.15275 • Published -
IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models
Paper • 2403.13535 • Published • 24 -
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models
Paper • 2309.05793 • Published • 50 -
GHOST 2.0: generative high-fidelity one shot transfer of heads
Paper • 2502.18417 • Published • 67
interesting architecture
-
FAN: Fourier Analysis Networks
Paper • 2410.02675 • Published • 28 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 89 -
Scalable-Softmax Is Superior for Attention
Paper • 2501.19399 • Published • 22 -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 7
generative-model-training
-
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Paper • 2310.00426 • Published • 60 -
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation
Paper • 2310.16656 • Published • 47 -
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images
Paper • 2310.16825 • Published • 36 -
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers
Paper • 2401.11605 • Published • 23
talking-head-generation
-
DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation
Paper • 2312.13578 • Published • 29 -
Splatter Image: Ultra-Fast Single-View 3D Reconstruction
Paper • 2312.13150 • Published • 16 -
Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians
Paper • 2312.03029 • Published • 26 -
Relightable Gaussian Codec Avatars
Paper • 2312.03704 • Published • 33
artistic rendering
-
Boundary Attention: Learning to Find Faint Boundaries at Any Resolution
Paper • 2401.00935 • Published • 18 -
Derendering/InkSight-Small-p
Updated • 72 • 32 -
E^{2}GAN: Efficient Training of Efficient GANs for Image-to-Image Translation
Paper • 2401.06127 • Published -
Acoustic Volume Rendering for Neural Impulse Response Fields
Paper • 2411.06307 • Published • 5
full-body-generation
-
URHand: Universal Relightable Hands
Paper • 2401.05334 • Published • 25 -
Synthesizing Moving People with 3D Control
Paper • 2401.10889 • Published • 12 -
GauHuman: Articulated Gaussian Splatting from Monocular Human Videos
Paper • 2312.02973 • Published • 1 -
ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering
Paper • 2312.05941 • Published • 1
text-to-media
-
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper • 2401.04577 • Published • 44 -
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 37 -
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
Paper • 2402.05054 • Published • 29
cool-papers
-
Depth Anything V2
Paper • 2406.09414 • Published • 104 -
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 52 -
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Paper • 2406.04338 • Published • 40 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 116