-
Role-Playing Evaluation for Large Language Models
Paper • 2505.13157 • Published • 7 -
ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents
Paper • 2505.23923 • Published • 7 -
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation
Paper • 2409.06820 • Published • 69 -
CoSER: Coordinating LLM-Based Persona Simulation of Established Roles
Paper • 2502.09082 • Published • 30
YYY
zzfive
AI & ML interests
None yet
Recent Activity
updated
a collection
about 3 hours ago
RL+reason model
updated
a collection
about 3 hours ago
robot
updated
a collection
about 3 hours ago
RL+reason model
Organizations
None yet
industry
-
Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt
Paper • 2505.09264 • Published • 5 -
AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection
Paper • 2505.09926 • Published • 6 -
MatTools: Benchmarking Large Language Models for Materials Science Tools
Paper • 2505.10852 • Published • 6 -
Normalized Attention Guidance: Universal Negative Guidance for Diffusion Model
Paper • 2505.21179 • Published • 11
ssm
inference optimization
-
Low-Rank Adapters Meet Neural Architecture Search for LLM Compression
Paper • 2501.16372 • Published • 10 -
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Paper • 2501.16937 • Published • 7 -
Matryoshka Quantization
Paper • 2502.06786 • Published • 30 -
Identifying Sensitive Weights via Post-quantization Integral
Paper • 2503.01901 • Published • 8
digital-human
-
One Shot, One Talk: Whole-body Talking Avatar from a Single Image
Paper • 2412.01106 • Published • 22 -
MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation
Paper • 2412.04448 • Published • 10 -
IDOL: Instant Photorealistic 3D Human Creation from a Single Image
Paper • 2412.14963 • Published • 6 -
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
Paper • 2502.01061 • Published • 218
medical
-
Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline
Paper • 2411.12814 • Published • 26 -
SegBook: A Simple Baseline and Cookbook for Volumetric Medical Image Segmentation
Paper • 2411.14525 • Published • 22 -
MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities
Paper • 2412.04106 • Published • 6 -
PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion
Paper • 2412.17780 • Published • 5
image
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 75
video
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 11 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13
cv
-
LocalMamba: Visual State Space Model with Windowed Selective Scan
Paper • 2403.09338 • Published • 9 -
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Paper • 2403.09394 • Published • 28 -
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper • 2402.19479 • Published • 35 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 31
datasets
-
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels
Paper • 2405.07526 • Published • 22 -
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
Paper • 2405.15613 • Published • 18 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper • 2402.13232 • Published • 15 -
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 32
audio
-
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Paper • 2405.18503 • Published • 9 -
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation
Paper • 2405.20289 • Published • 11 -
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Paper • 2406.02897 • Published • 16 -
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 21
dLLM
-
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding
Paper • 2505.22618 • Published • 42 -
DINGO: Constrained Inference for Diffusion LLMs
Paper • 2505.23061 • Published • 30 -
Discrete Diffusion in Large Language and Multimodal Models: A Survey
Paper • 2506.13759 • Published • 41 -
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs
Paper • 2506.14429 • Published • 43
RAG
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 10 -
ReZero: Enhancing LLM search ability by trying one-more-time
Paper • 2504.11001 • Published • 15 -
Retrieval-Augmented Generation with Conflicting Evidence
Paper • 2504.13079 • Published • 7 -
NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes
Paper • 2504.11544 • Published • 42
safety
-
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
Paper • 2502.05163 • Published • 22 -
CRANE: Reasoning with constrained LLM generation
Paper • 2502.09061 • Published • 20 -
Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models
Paper • 2502.15799 • Published • 7 -
AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement
Paper • 2502.16776 • Published • 6
RL+reason model
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 5
benchmark
-
GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
Paper • 2411.18499 • Published • 18 -
VLSBench: Unveiling Visual Leakage in Multimodal Safety
Paper • 2411.19939 • Published • 10 -
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Paper • 2412.02611 • Published • 24 -
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs
Paper • 2412.03205 • Published • 16
3d
-
TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion
Paper • 2401.09416 • Published • 11 -
SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild
Paper • 2401.10171 • Published • 14 -
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model
Paper • 2311.09217 • Published • 22 -
GALA: Generating Animatable Layered Assets from a Single Scan
Paper • 2401.12979 • Published • 9
LLMs
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 150 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 59 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 49
agent
-
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Paper • 2402.15506 • Published • 17 -
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
Paper • 2404.03648 • Published • 29 -
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
Paper • 2405.19893 • Published • 32 -
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
Paper • 2405.19888 • Published • 7
Infrastructure
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 30 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 51 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 34
multimodal
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 17 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 56 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 35
robot
-
GRUtopia: Dream General Robots in a City at Scale
Paper • 2407.10943 • Published • 26 -
Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion
Paper • 2407.10973 • Published • 11 -
Cross Anything: General Quadruped Robot Navigation through Complex Terrains
Paper • 2407.16412 • Published • 6 -
RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands
Paper • 2408.11048 • Published • 4
RolePlaying
-
Role-Playing Evaluation for Large Language Models
Paper • 2505.13157 • Published • 7 -
ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents
Paper • 2505.23923 • Published • 7 -
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation
Paper • 2409.06820 • Published • 69 -
CoSER: Coordinating LLM-Based Persona Simulation of Established Roles
Paper • 2502.09082 • Published • 30
dLLM
-
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding
Paper • 2505.22618 • Published • 42 -
DINGO: Constrained Inference for Diffusion LLMs
Paper • 2505.23061 • Published • 30 -
Discrete Diffusion in Large Language and Multimodal Models: A Survey
Paper • 2506.13759 • Published • 41 -
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs
Paper • 2506.14429 • Published • 43
industry
-
Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt
Paper • 2505.09264 • Published • 5 -
AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection
Paper • 2505.09926 • Published • 6 -
MatTools: Benchmarking Large Language Models for Materials Science Tools
Paper • 2505.10852 • Published • 6 -
Normalized Attention Guidance: Universal Negative Guidance for Diffusion Model
Paper • 2505.21179 • Published • 11
RAG
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 10 -
ReZero: Enhancing LLM search ability by trying one-more-time
Paper • 2504.11001 • Published • 15 -
Retrieval-Augmented Generation with Conflicting Evidence
Paper • 2504.13079 • Published • 7 -
NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes
Paper • 2504.11544 • Published • 42
ssm
safety
-
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
Paper • 2502.05163 • Published • 22 -
CRANE: Reasoning with constrained LLM generation
Paper • 2502.09061 • Published • 20 -
Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models
Paper • 2502.15799 • Published • 7 -
AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement
Paper • 2502.16776 • Published • 6
inference optimization
-
Low-Rank Adapters Meet Neural Architecture Search for LLM Compression
Paper • 2501.16372 • Published • 10 -
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Paper • 2501.16937 • Published • 7 -
Matryoshka Quantization
Paper • 2502.06786 • Published • 30 -
Identifying Sensitive Weights via Post-quantization Integral
Paper • 2503.01901 • Published • 8
RL+reason model
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 5
digital-human
-
One Shot, One Talk: Whole-body Talking Avatar from a Single Image
Paper • 2412.01106 • Published • 22 -
MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation
Paper • 2412.04448 • Published • 10 -
IDOL: Instant Photorealistic 3D Human Creation from a Single Image
Paper • 2412.14963 • Published • 6 -
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
Paper • 2502.01061 • Published • 218
benchmark
-
GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
Paper • 2411.18499 • Published • 18 -
VLSBench: Unveiling Visual Leakage in Multimodal Safety
Paper • 2411.19939 • Published • 10 -
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Paper • 2412.02611 • Published • 24 -
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs
Paper • 2412.03205 • Published • 16
medical
-
Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline
Paper • 2411.12814 • Published • 26 -
SegBook: A Simple Baseline and Cookbook for Volumetric Medical Image Segmentation
Paper • 2411.14525 • Published • 22 -
MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities
Paper • 2412.04106 • Published • 6 -
PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion
Paper • 2412.17780 • Published • 5
3d
-
TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion
Paper • 2401.09416 • Published • 11 -
SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild
Paper • 2401.10171 • Published • 14 -
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model
Paper • 2311.09217 • Published • 22 -
GALA: Generating Animatable Layered Assets from a Single Scan
Paper • 2401.12979 • Published • 9
image
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 75
LLMs
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 150 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 59 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 49
video
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 18 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 9 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 11 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13
agent
-
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Paper • 2402.15506 • Published • 17 -
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent
Paper • 2404.03648 • Published • 29 -
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
Paper • 2405.19893 • Published • 32 -
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
Paper • 2405.19888 • Published • 7
cv
-
LocalMamba: Visual State Space Model with Windowed Selective Scan
Paper • 2403.09338 • Published • 9 -
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Paper • 2403.09394 • Published • 28 -
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper • 2402.19479 • Published • 35 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 31
Infrastructure
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 30 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 51 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 34
datasets
-
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels
Paper • 2405.07526 • Published • 22 -
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
Paper • 2405.15613 • Published • 18 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper • 2402.13232 • Published • 15 -
How Do Large Language Models Acquire Factual Knowledge During Pretraining?
Paper • 2406.11813 • Published • 32
multimodal
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 17 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 56 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 35
audio
-
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Paper • 2405.18503 • Published • 9 -
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation
Paper • 2405.20289 • Published • 11 -
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Paper • 2406.02897 • Published • 16 -
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 21
robot
-
GRUtopia: Dream General Robots in a City at Scale
Paper • 2407.10943 • Published • 26 -
Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion
Paper • 2407.10973 • Published • 11 -
Cross Anything: General Quadruped Robot Navigation through Complex Terrains
Paper • 2407.16412 • Published • 6 -
RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands
Paper • 2408.11048 • Published • 4