PM4Bench: A Parallel Multilingual Multi-Modal Multi-task Benchmark for Large Vision Language Model Paper • 2503.18484 • Published Mar 24
Coding Triangle: How Does Large Language Model Understand Code? Paper • 2507.06138 • Published 4 days ago • 18
Rethinking Verification for LLM Code Generation: From Generation to Testing Paper • 2507.06920 • Published 3 days ago • 26
Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective Paper • 2505.19815 • Published May 26 • 37
LiT: Delving into a Simplified Linear Diffusion Transformer for Image Generation Paper • 2501.12976 • Published Jan 22
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance Paper • 2401.08772 • Published Jan 16, 2024 • 1
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper • 2502.06781 • Published Feb 10 • 61
Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement Paper • 2501.12273 • Published Jan 21 • 14
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution Paper • 2410.16256 • Published Oct 21, 2024 • 61
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs Paper • 2410.12405 • Published Oct 16, 2024 • 13
InternLM-Law: An Open Source Chinese Legal Large Language Model Paper • 2406.14887 • Published Jun 21, 2024
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark Paper • 2405.12209 • Published May 20, 2024
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin Paper • 2407.10499 • Published Jul 15, 2024
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios Paper • 2408.17267 • Published Aug 30, 2024 • 24
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models Paper • 2409.16191 • Published Sep 24, 2024 • 43
RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer Paper • 2304.05659 • Published Apr 12, 2023
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition Paper • 2309.15112 • Published Sep 26, 2023 • 2
BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues Paper • 2310.13650 • Published Oct 20, 2023