11 25 4

fulong ye

Alon77777

https://scholar.google.com.hk/citations?hl=zh-CN&user=-BbQ5VgAAAAJ

superhero-7

AI & ML interests

vision and language, diffusion model, text-to-image generation, image-to-text generation, referring expression generation and comprehension

Recent Activity

upvoted a paper about 2 months ago

Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset

upvoted a paper 3 months ago

Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors

authored a paper 4 months ago

DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group Learning

View all activity

Organizations

upvoted a paper about 2 months ago

Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset

Paper • 2506.18851 • Published Jun 23 • 29

upvoted a paper 3 months ago

Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors

Paper • 2505.24625 • Published May 30 • 8

upvoted a paper 4 months ago

DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group Learning

Paper • 2504.14509 • Published Apr 20 • 51

upvoted a paper 6 months ago

Phantom: Subject-consistent video generation via cross-modal alignment

Paper • 2502.11079 • Published Feb 16 • 60

upvoted 2 papers 8 months ago

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 105

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Paper • 2412.18619 • Published Dec 16, 2024 • 59

upvoted 6 papers about 1 year ago

Aquila2 Technical Report

Paper • 2408.07410 • Published Aug 14, 2024 • 15

IMAGDressing-v1: Customizable Virtual Dressing

Paper • 2407.12705 • Published Jul 17, 2024 • 13

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Paper • 2407.07895 • Published Jul 10, 2024 • 43

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Paper • 2407.04842 • Published Jul 5, 2024 • 57

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6, 2024 • 76

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published May 16, 2024 • 31

upvoted 7 papers over 1 year ago

Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support

Paper • 2401.14688 • Published Jan 26, 2024 • 13

Repositioning the Subject within Image

Paper • 2401.16861 • Published Jan 30, 2024 • 14

MM-LLMs: Recent Advances in MultiModal Large Language Models

Paper • 2401.13601 • Published Jan 24, 2024 • 49

upvoted a paper almost 2 years ago

UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs

Paper • 2311.09257 • Published Nov 14, 2023 • 48

fulong ye

AI & ML interests

Recent Activity

Organizations

Alon77777's activity