-
271
CoTracker
π¨Track points in a video
-
CoTracker: It is Better to Track Together
Paper β’ 2307.07635 β’ Published β’ 18 -
TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement
Paper β’ 2306.08637 β’ Published -
DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video
Paper β’ 2403.14548 β’ Published
Johannes Kolbe PRO
johko
AI & ML interests
None yet
Recent Activity
liked
a Space
10 days ago
ariG23498/zero-shot-od
liked
a Space
17 days ago
Wan-AI/Wan-2.2-5B
upvoted
an
article
about 1 month ago
FineWeb-C: A Community-Driven Dataset for Educational Quality Annotations in 122 Languages
Organizations
Deceptive Prompts for MLLMs
-
A Survey on Hallucination in Large Vision-Language Models
Paper β’ 2402.00253 β’ Published -
Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance
Paper β’ 2402.08680 β’ Published β’ 1 -
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts
Paper β’ 2402.13220 β’ Published β’ 15 -
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
Paper β’ 2404.05046 β’ Published
Virtual Try-On
-
IMAGDressing-v1: Customizable Virtual Dressing
Paper β’ 2407.12705 β’ Published β’ 13 -
Dress Code: High-Resolution Multi-Category Virtual Try-On
Paper β’ 2204.08532 β’ Published β’ 2 -
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
Paper β’ 2403.01779 β’ Published β’ 31 -
Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing
Paper β’ 2403.14828 β’ Published
Consistent Image Generation
-
Training-Free Consistent Text-to-Image Generation
Paper β’ 2402.03286 β’ Published β’ 68 -
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper β’ 2311.10093 β’ Published β’ 59 -
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization
Paper β’ 2402.09812 β’ Published β’ 16 -
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
Paper β’ 2405.01434 β’ Published β’ 57
VLM Interleaved Images
-
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Paper β’ 2407.07895 β’ Published β’ 43 -
SEED-Story: Multimodal Long Story Generation with Large Language Model
Paper β’ 2407.08683 β’ Published β’ 26 -
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation
Paper β’ 2407.06135 β’ Published β’ 23 -
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Paper β’ 2407.03320 β’ Published β’ 96
Text driven Image Editing
Point Tracking
-
Running on Zero271271
CoTracker
π¨Track points in a video
-
CoTracker: It is Better to Track Together
Paper β’ 2307.07635 β’ Published β’ 18 -
TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement
Paper β’ 2306.08637 β’ Published -
DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video
Paper β’ 2403.14548 β’ Published
Consistent Image Generation
-
Training-Free Consistent Text-to-Image Generation
Paper β’ 2402.03286 β’ Published β’ 68 -
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper β’ 2311.10093 β’ Published β’ 59 -
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization
Paper β’ 2402.09812 β’ Published β’ 16 -
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
Paper β’ 2405.01434 β’ Published β’ 57
Deceptive Prompts for MLLMs
-
A Survey on Hallucination in Large Vision-Language Models
Paper β’ 2402.00253 β’ Published -
Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance
Paper β’ 2402.08680 β’ Published β’ 1 -
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts
Paper β’ 2402.13220 β’ Published β’ 15 -
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
Paper β’ 2404.05046 β’ Published
VLM Interleaved Images
-
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Paper β’ 2407.07895 β’ Published β’ 43 -
SEED-Story: Multimodal Long Story Generation with Large Language Model
Paper β’ 2407.08683 β’ Published β’ 26 -
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation
Paper β’ 2407.06135 β’ Published β’ 23 -
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Paper β’ 2407.03320 β’ Published β’ 96
Virtual Try-On
-
IMAGDressing-v1: Customizable Virtual Dressing
Paper β’ 2407.12705 β’ Published β’ 13 -
Dress Code: High-Resolution Multi-Category Virtual Try-On
Paper β’ 2204.08532 β’ Published β’ 2 -
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
Paper β’ 2403.01779 β’ Published β’ 31 -
Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing
Paper β’ 2403.14828 β’ Published
Text driven Image Editing