-
SyncDreamer: Generating Multiview-consistent Images from a Single-view Image
Paper • 2309.03453 • Published • 13 -
TeCH: Text-guided Reconstruction of Lifelike Clothed Humans
Paper • 2308.08545 • Published • 34 -
Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model
Paper • 2311.06214 • Published • 33
sambit sekhar
sam2ai
AI & ML interests
Recsys, text2image, audio2image, transformer, ddpm, LLM
Recent Activity
updated
a Space
7 days ago
sam2ai/open-webui-odiagen
published
a Space
7 days ago
sam2ai/open-webui-odiagen
updated
a model
11 days ago
sam2ai/llama_3.1_8b_r_1
Organizations
erase_image_add_image
-
ProPainter: Improving Propagation and Transformer for Video Inpainting
Paper • 2309.03897 • Published • 27 -
Text2Layer: Layered Image Generation using Latent Diffusion Model
Paper • 2307.09781 • Published • 15 -
Generate Anything Anywhere in Any Scene
Paper • 2306.17154 • Published • 22 -
LRM: Large Reconstruction Model for Single Image to 3D
Paper • 2311.04400 • Published • 52
Ai_Avatar
Video_gen
-
VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation
Paper • 2309.00398 • Published • 22 -
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
Paper • 2307.04725 • Published • 64 -
LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance
Paper • 2307.00522 • Published • 32 -
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning
Paper • 2309.15091 • Published • 33
Doc_processing
-
Nougat: Neural Optical Understanding for Academic Documents
Paper • 2308.13418 • Published • 39 -
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Paper • 2307.02499 • Published • 14 -
Text Rendering Strategies for Pixel Language Models
Paper • 2311.00522 • Published • 12
NerF
Moe
Interpolation
speech2text
Text_to_image
-
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models
Paper • 2309.05793 • Published • 50 -
FreeU: Free Lunch in Diffusion U-Net
Paper • 2309.11497 • Published • 65 -
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Paper • 2310.00426 • Published • 60 -
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Paper • 2311.05556 • Published • 87
Datasets
llm
-
Large Language Models as Optimizers
Paper • 2309.03409 • Published • 77 -
One Wide Feedforward is All You Need
Paper • 2309.01826 • Published • 33 -
Self-Alignment with Instruction Backtranslation
Paper • 2308.06259 • Published • 42 -
Shepherd: A Critic for Language Model Generation
Paper • 2308.04592 • Published • 32
segment_anything
-
SLiMe: Segment Like Me
Paper • 2309.03179 • Published • 30 -
Follow Anything: Open-set detection, tracking, and following in real-time
Paper • 2308.05737 • Published • 12 -
Semantic-SAM: Segment and Recognize Anything at Any Granularity
Paper • 2307.04767 • Published • 22 -
Fast Segment Anything
Paper • 2306.12156 • Published • 34
Llm_long_context
audio_llm
-
LLaSM: Large Language and Speech Model
Paper • 2308.15930 • Published • 34 -
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Paper • 2308.06873 • Published • 27 -
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Paper • 2308.05734 • Published • 37 -
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
Paper • 2308.04729 • Published • 32
Text_trajectory_videogen
MM_LLM
-
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Paper • 2308.01390 • Published • 33 -
Med-Flamingo: a Multimodal Medical Few-shot Learner
Paper • 2307.15189 • Published • 23 -
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Paper • 2307.08581 • Published • 28 -
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Paper • 2307.03601 • Published • 12
Recsys
New_llm_arch
tinygpt
Llm_agent
Voice2sing
2d-to-3d-image
-
SyncDreamer: Generating Multiview-consistent Images from a Single-view Image
Paper • 2309.03453 • Published • 13 -
TeCH: Text-guided Reconstruction of Lifelike Clothed Humans
Paper • 2308.08545 • Published • 34 -
Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model
Paper • 2311.06214 • Published • 33
llm
-
Large Language Models as Optimizers
Paper • 2309.03409 • Published • 77 -
One Wide Feedforward is All You Need
Paper • 2309.01826 • Published • 33 -
Self-Alignment with Instruction Backtranslation
Paper • 2308.06259 • Published • 42 -
Shepherd: A Critic for Language Model Generation
Paper • 2308.04592 • Published • 32
erase_image_add_image
-
ProPainter: Improving Propagation and Transformer for Video Inpainting
Paper • 2309.03897 • Published • 27 -
Text2Layer: Layered Image Generation using Latent Diffusion Model
Paper • 2307.09781 • Published • 15 -
Generate Anything Anywhere in Any Scene
Paper • 2306.17154 • Published • 22 -
LRM: Large Reconstruction Model for Single Image to 3D
Paper • 2311.04400 • Published • 52
segment_anything
-
SLiMe: Segment Like Me
Paper • 2309.03179 • Published • 30 -
Follow Anything: Open-set detection, tracking, and following in real-time
Paper • 2308.05737 • Published • 12 -
Semantic-SAM: Segment and Recognize Anything at Any Granularity
Paper • 2307.04767 • Published • 22 -
Fast Segment Anything
Paper • 2306.12156 • Published • 34
Ai_Avatar
Llm_long_context
Video_gen
-
VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation
Paper • 2309.00398 • Published • 22 -
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
Paper • 2307.04725 • Published • 64 -
LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance
Paper • 2307.00522 • Published • 32 -
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning
Paper • 2309.15091 • Published • 33
audio_llm
-
LLaSM: Large Language and Speech Model
Paper • 2308.15930 • Published • 34 -
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Paper • 2308.06873 • Published • 27 -
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Paper • 2308.05734 • Published • 37 -
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
Paper • 2308.04729 • Published • 32
Doc_processing
-
Nougat: Neural Optical Understanding for Academic Documents
Paper • 2308.13418 • Published • 39 -
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Paper • 2307.02499 • Published • 14 -
Text Rendering Strategies for Pixel Language Models
Paper • 2311.00522 • Published • 12
Text_trajectory_videogen
NerF
MM_LLM
-
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Paper • 2308.01390 • Published • 33 -
Med-Flamingo: a Multimodal Medical Few-shot Learner
Paper • 2307.15189 • Published • 23 -
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Paper • 2307.08581 • Published • 28 -
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Paper • 2307.03601 • Published • 12
Moe
Recsys
Interpolation
New_llm_arch
speech2text
tinygpt
Text_to_image
-
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models
Paper • 2309.05793 • Published • 50 -
FreeU: Free Lunch in Diffusion U-Net
Paper • 2309.11497 • Published • 65 -
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Paper • 2310.00426 • Published • 60 -
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Paper • 2311.05556 • Published • 87
Llm_agent
Datasets
Voice2sing