Song Dingjie's picture

Song Dingjie

songdj

·

bbsngg

AI & ML interests

None yet

Recent Activity

authored a paper 25 days ago

Agentic Robot: A Brain-Inspired Framework for Vision-Language-Action Models in Embodied Agents

authored a paper 25 days ago

SAMed-2: Selective Memory Enhanced Medical Segment Anything Model

upvoted a paper 25 days ago

SAMed-2: Selective Memory Enhanced Medical Segment Anything Model

View all activity

Organizations

upvoted a paper 25 days ago

SAMed-2: Selective Memory Enhanced Medical Segment Anything Model

Paper • 2507.03698 • Published 30 days ago • 11

upvoted a paper about 1 month ago

ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

Paper • 2506.18095 • Published Jun 22 • 65

upvoted 3 papers about 2 months ago

Agentic Robot: A Brain-Inspired Framework for Vision-Language-Action Models in Embodied Agents

Paper • 2505.23450 • Published May 29 • 9

CoRT: Code-integrated Reasoning within Thinking

Paper • 2506.09820 • Published Jun 11 • 18

A Survey on Post-training of Large Language Models

Paper • 2503.06072 • Published Mar 8 • 9

upvoted a paper 2 months ago

MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks

Paper • 2505.16459 • Published May 22 • 46

upvoted 2 papers 3 months ago

Towards Understanding Camera Motions in Any Video

Paper • 2504.15376 • Published Apr 21 • 159

NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes

Paper • 2504.11544 • Published Apr 15 • 43

upvoted 2 papers 5 months ago

Aligning Multimodal LLM with Human Preference: A Survey

Paper • 2503.14504 • Published Mar 18 • 26

Soundwave: Less is More for Speech-Text Alignment in LLMs

Paper • 2502.12900 • Published Feb 18 • 86

upvoted 3 papers 7 months ago

Enabling Scalable Oversight via Self-Evolving Critic

Paper • 2501.05727 • Published Jan 10 • 76

On the Compositional Generalization of Multimodal LLMs for Medical Imaging

Paper • 2412.20070 • Published Dec 28, 2024 • 47

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 105

upvoted 2 papers 9 months ago

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15, 2024 • 127

Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination

Paper • 2411.03823 • Published Nov 6, 2024 • 50

upvoted 4 papers 10 months ago

Roadmap towards Superhuman Speech Understanding using Large Language Models

Paper • 2410.13268 • Published Oct 17, 2024 • 35

Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts

Paper • 2410.10626 • Published Oct 14, 2024 • 40

MLP-KAN: Unifying Deep Representation and Function Learning

Paper • 2410.03027 • Published Oct 3, 2024 • 32

Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs

Paper • 2409.10994 • Published Sep 17, 2024 • 1

upvoted a paper 11 months ago

One missing piece in Vision and Language: A Survey on Comics Understanding

Paper • 2409.09502 • Published Sep 14, 2024 • 26