-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 31 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 27 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 101
Collections
Discover the best community collections!
Collections including paper arxiv:2504.17192
-
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Paper • 2502.02508 • Published • 23 -
Chain of Draft: Thinking Faster by Writing Less
Paper • 2502.18600 • Published • 48 -
Chain of Agents: Large Language Models Collaborating on Long-Context Tasks
Paper • 2406.02818 • Published -
Chain-of-Retrieval Augmented Generation
Paper • 2501.14342 • Published • 56
-
RuCCoD: Towards Automated ICD Coding in Russian
Paper • 2502.21263 • Published • 132 -
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 121 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27
-
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 111 -
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
Paper • 2503.11576 • Published • 98 -
ToolRL: Reward is All Tool Learning Needs
Paper • 2504.13958 • Published • 39 -
OTC: Optimal Tool Calls via Reinforcement Learning
Paper • 2504.14870 • Published • 31
-
SurveyX: Academic Survey Automation via Large Language Models
Paper • 2502.14776 • Published • 100 -
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
Paper • 2408.06292 • Published • 124 -
Towards an AI co-scientist
Paper • 2502.18864 • Published • 49 -
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Paper • 2504.17192 • Published • 58
-
How to Synthesize Text Data without Model Collapse?
Paper • 2412.14689 • Published • 53 -
SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
Paper • 2412.12094 • Published • 11 -
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Paper • 2306.07691 • Published • 8 -
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform
Paper • 2203.02395 • Published
-
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 97 -
IamCreateAI/Ruyi-Mini-7B
Image-to-Video • Updated • 470 • 609 -
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation
Paper • 2412.06016 • Published • 20 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 101
-
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Paper • 2411.11504 • Published • 22 -
Top-nσ: Not All Logits Are You Need
Paper • 2411.07641 • Published • 22 -
Adaptive Decoding via Latent Preference Optimization
Paper • 2411.09661 • Published • 10 -
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Paper • 2411.13476 • Published • 16
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 35 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 28 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 23
-
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
Paper • 2310.03731 • Published • 29 -
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search
Paper • 2310.13227 • Published • 13 -
ChipNeMo: Domain-Adapted LLMs for Chip Design
Paper • 2311.00176 • Published • 9 -
Language Models can be Logical Solvers
Paper • 2311.06158 • Published • 23