zh-ai-community
's Collections
π DeepSeek Papers
updated
DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via
Reinforcement Learning for Subgoal Decomposition
Paper
β’
2504.21801
β’
Published
β’
2
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
β’
2501.12948
β’
Published
β’
415
Insights into DeepSeek-V3: Scaling Challenges and Reflections on
Hardware for AI Architectures
Paper
β’
2505.09343
β’
Published
β’
68
DeepSeek-V3 Technical Report
Paper
β’
2412.19437
β’
Published
β’
68
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts
Language Model
Paper
β’
2405.04434
β’
Published
β’
22
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code
Intelligence
Paper
β’
2406.11931
β’
Published
β’
65
DeepSeek-Coder: When the Large Language Model Meets Programming -- The
Rise of Code Intelligence
Paper
β’
2401.14196
β’
Published
β’
66
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for
Reinforcement Learning and Monte-Carlo Tree Search
Paper
β’
2408.08152
β’
Published
β’
60
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale
Synthetic Data
Paper
β’
2405.14333
β’
Published
β’
41
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper
β’
2401.02954
β’
Published
β’
49
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced
Multimodal Understanding
Paper
β’
2412.10302
β’
Published
β’
18
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper
β’
2403.05525
β’
Published
β’
47
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
Language Models
Paper
β’
2402.03300
β’
Published
β’
126
DeepSeekMoE: Towards Ultimate Expert Specialization in
Mixture-of-Experts Language Models
Paper
β’
2401.06066
β’
Published
β’
55
Inference-Time Scaling for Generalist Reward Modeling
Paper
β’
2504.02495
β’
Published
β’
57
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion
Prior
Paper
β’
2310.16818
β’
Published
β’
32
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for
Sparse Architectural Large Language Models
Paper
β’
2407.01906
β’
Published
β’
43
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding
and Generation
Paper
β’
2410.13848
β’
Published
β’
35
Janus-Pro: Unified Multimodal Understanding and Generation with Data and
Model Scaling
Paper
β’
2501.17811
β’
Published
β’
6
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse
Attention
Paper
β’
2502.11089
β’
Published
β’
164