DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model Paper • 2405.04434 • Published May 7, 2024 • 21
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 145
mobiuslabsgmbh/DeepSeek-R1-ReDistill-Llama3-8B-v1.1 Text Generation • 8B • Updated Jan 30 • 12 • • 11