š#85: Curiosity, Open Source, and Timing: The Formula Behind DeepSeekās Phenomenal Success
How an open-source mindset, relentless curiosity, and strategic calculation are rewriting the rules in AI and challenging Western companies, plus an excellent reading list and curated research collection
š³ Turing Post is on š¤ Hugging Face as a resident -> click to follow!
When we first covered DeepSeek models in August 2024 (we are opening that article for everyone, do read it), it didnāt gain much traction. That surprised me! Back then, DeepSeek was already one of the most exciting examples of curiosity-driven research in AI, committed to open-sourcing its discoveries. They also employed an intriguing approach: unlike many others racing to beat benchmarks, DeepSeek pivoted to addressing specific challenges, fostering innovation that extended beyond conventional metrics. Even then, they demonstrated significant cost reductions.
āWhatās behind DeepSeek-Coder-V2 that makes it so special it outperforms GPT-4 Turbo, Claude-3 Opus, Gemini 1.5 Pro, Llama 3-70B, and Codestral in coding and math?
DeepSeek-Coder-V2, costing 20ā50x less than other models, represents a major upgrade over the original DeepSeek-Coder. It features more extensive training data, larger and more efficient models, improved context handling, and advanced techniques like Fill-In-The-Middle and Reinforcement Learning.ā (Inside DeepSeek Models)
Although DeepSeek was making waves in the research community, it remained largely unnoticed by the broader public. But then they released R1-Zero and R1.
With that release they crushed industry benchmarks and disrupted the market by training their models at a fraction of the typical cost. But do you know what else they did? Not only did they prove that reinforcement learning (RL) is all you need in reasoning (R1 stands as solid proof of how well RL works), but they also embraced a trial-and-error approach ā fundamental to RL ā for their own business strategies. Previously overlooked, they calculated this release of R1 meticulously. Did you catch the timing? It was a strategic earthquake that shook the market and left everyone reeling:
- As ChinaTalk noticed: āR1's release during President Trumpās inauguration last week was clearly intended to rattle public confidence in the United Statesā AI leadership at a pivotal moment in US policy, mirroring Huawei's product launch during former Secretary Raimondo's China visit. After all, the benchmark results of an R1 preview had already been public since November.ā
- The release happened just one week before the Chinese Lunar New Year (this year on January 29), which typically lasts 15 days. However, the week leading up to the holiday is often quiet, giving them a perfect window to outshine other Chinese companies and maximize their PR impact.
So, while the DeepSeek family of models serves as a case study in the power of open-source development paired with relentless curiosity (from an interview with Liang Wenfeng, DeepSeekās CEO: āMany might think there's an undisclosed business logic behind this, but in reality, it's primarily driven by curiosity.ā), itās also an example of cold-blooded calculation and triumph of reinforcement learning applied to both models and humans :). DeepSeek has shown a deep understanding of how to play Western games and excel at them. Of course, todayās market downturn, though concerning to many, will likely recover soon. However, if DeepSeek can achieve such outstanding results, Western companies need to reassess their strategies quickly and clarify their actual competitive moats.
Worries about NVIDIA
Of course, weāll still need a lot of compute ā everyone is hungry for it. Thatās a quote from Liang Wenfeng, DeepSeekās CEO: āFor researchers, the thirst for computational power is insatiable. After conducting small-scale experiments, there's always a desire to conduct larger ones. Since then, we've consciously deployed as much computational power as possible.ā
So, letās not count NVIDIA out. What we can count on is Jensen Huangās knack for staying ahead to find the way to stay relevant (NVIDIA wasnāt started as an AI company, if you remember). But what the rise of innovators like DeepSeek could push NVIDIA to is to double down on openness. Beyond the technical benefits, an aggressive push toward open-sourcing could serve as a powerful PR boost, reinforcing Nvidiaās centrality in the ever-expanding AI ecosystem.
As I was writing these words about NVIDIA, they sent a statement regarding DeepSeek: āDeepSeek is an excellent AI advancement and a perfect example of Test Time Scaling. DeepSeekās work illustrates how new models can be created using that technique, leveraging widely-available models and compute that is fully export control compliant. Inference requires significant numbers of NVIDIA GPUs and high-performance networking. We now have three scaling laws: pre-training and post-training, which continue, and new test-time scaling.ā
So ā to wrap up ā the main takeaway from DeepSeek breakthrough is that:
- open-source and decentralize
- stay curiosity-driven
- apply reinforcement learning to everything
For DeepSeek, this is just the beginning. As curiosity continues to drive its efforts, it has proven that breakthroughs come not from hoarding innovation but from sharing it. As we move forward, itās these principles that will shape the future of AI.
We are reading (itās all about š³)
Here is a collection of superb articles covering everything you need to know about DeepSeek:
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (the paper)
- Fresh release (Jan 27): DeepSeek Janus-Pro
- āDeepSeek-R1 and exploring DeepSeek-R1-Distill-Llama-8Bā (Simon Willisonās blog)
- China Open-Source 2024 Annual Report (a report in Chinese but can we well summarized with ChatGPT)
- [DeepSeek FAQ] (https://stratechery.com/2025/deepseek-faq/) (by Stratechery, very detailed, I loved it)
- DeepSeek and the Future of AI Competition with Miles Brundage(by China Talk)
- The Deep Roots of DeepSeek: How It All Began (by Recode China AI)
- DeepSeek means AI proliferation is guaranteed (by Import AI. Favorite quote: āTheir key innovation lies in showing that any large language model (LLM) can be transformed into a reasoning powerhouse using just 800k curated samples of question-answer chains of thought.ā)
- Hugging Faceās science team fully reproduced and open-sourcing R1 including training data, training scripts(on GitHub)
And yes, I agree with merve:
Curated Collections
7 Open-source Methods to Improve Video Generation and Understanding
Weekly recommendation from AI practitioneršš¼
To run DeepSeek models offline using LM Studio:
- Install LM Studio: Download the appropriate version for your operating system from the LM Studio website. Follow the installation instructions provided.
- Download the DeepSeek Model: Open LM Studio and navigate to the "Discover" tab. Search for "DeepSeek" and select your desired model. Click "Download" to save the model locally.
- Run the Model Offline: Once downloaded, go to the "Local Models" section. Select the DeepSeek model and click "Load." You can interact with the model directly within LM Studio without an internet connection.
News from The Usual Suspects Ā©
Data Center News
$500B Stargate AI Venture by OpenAI, Oracle, and SoftBank
With plans to build massive data centers and energy facilities in Texas, Stargate aims to bolster U.S. AI dominance. Partners like NVIDIA and Microsoft bring muscle to this high-stakes competition with China. Trump supports it, Musk trashes.
Meta's Manhattan-Sized AI Leap
Mark Zuckerbergās AI ambitions come on a smaller scale (haha) ā $65 billion for a data center so vast it could envelop Manhattan. With 1.3 million GPUs powering this, Meta aims to revolutionize its ecosystem and rival Americaās AI heavyweights. The era of AI megaprojects is here.
Mistralās IPO Plans: Vive la RĆ©sistance French AI startup Mistral isnāt selling out. With ā¬1 billion raised, CEO Arthur Mensch eyes an IPO while doubling down on open-source LLMs. Positioned as a European powerhouse, Mistralās independence signals Europeās readiness to play hardball in the global AI race.
SmolVLM: Hugging Face Goes Tiny Hugging Face introduces SmolVLM, two of the smallest foundation models yet. This open-source release proves size doesnāt matter when efficiency leads the charge, setting new standards for compact AI development.
OpenAI's Agent Takes the Wheel CUA (Computer-Using Agent) redefines multitasking with Operator, seamlessly interacting with GUIs like a digital power user. From downloading PDFs to complex web tasks, itās the closest weāve come to a universal assistant .CUA is now in Operator's research preview for Pro users. Blog. System Card.
Google DeepMind A Year in Geminiās Orbit They just published an overview of 2024. From Gemini 2.0's breakthroughs in multimodal AI to Willow chipās quantum strides, innovation soared. Med-Gemini aced medical exams, AlphaFold 3 advanced molecular science, and ALOHA redefined robotics. With disaster readiness, educational tools, and responsible AI initiatives, DeepMind balanced cutting-edge tech with global impact. A Nobel-worthy streak indeed. Cost-Cutting AI with "Light Chips" Demis Hassabis unveils Google's next move ā custom "light chips" designed to slash AI model costs while boosting efficiency. These chips power Gemini 2.0 Flash, with multimodal AI, 1M-token memory, and a "world model" vision for AGI. DeepMindās edge? Owning every layer of the AI stack, from chips to algorithms.
Top models to pay attention to
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Enhance reasoning in LLMs with multi-stage reinforcement learning, outperforming competitors in benchmarks like AIME 2024 and MATH-500.
- Kimi K1.5: Scaling Reinforcement Learning with LLMs Scale reasoning capabilities with efficient reinforcement learning methods, optimizing token usage for both long- and short-chain-of-thought tasks.
- VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Advance image and video understanding with multimodal integration, achieving top results in temporal reasoning and long-video tasks.
- Qwen2.5-1M Series Support 1M-token contexts with open-source models, leveraging sparse attention and lightning-fast inference frameworks for long-context tasks.
The freshest research papers, categorized for your convenience
There were quite a few TOP research papers this week, we will mark them with š in each section.
Specialized Architectures and Techniques
- š Demons in the Detail: Introduces load-balancing loss for training Mixture-of-Experts models.
- š Autonomy-of-Experts Models: Proposes expert self-selection to improve Mixture-of-Experts efficiency and scalability.
- O1-Pruner: Length-Harmonizing Fine-Tuning: Reduces inference overhead in reasoning models through reinforcement learning-based pruning. Language Model Reasoning and Decision-Making
- š Evolving Deeper LLM Thinking: Explores genetic search methods to enhance natural language inference for planning tasks, achieving superior accuracy.
- š Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training: Develops a framework for LLMs to self-correct using Monte Carlo Tree Search and iterative refinement.
- š Reasoning Language Models: A Blueprint: Proposes a modular framework integrating reasoning methods to democratize reasoning capabilities.
- Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback: Enhances mathematical reasoning with stepwise binary feedback for more accurate LLM outputs.
- Test-Time Preference Optimization: Introduces a framework for aligning LLM outputs to human preferences during inference without retraining.
Multi-Agent Systems and Coordination
- š SRMT: Shared Memory for Multi-Agent Lifelong Pathfinding: Demonstrates shared memory use for enhanced coordination in multi-agent systems.
- Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks: Develops a hierarchical agent framework for mobile assistants with self-evolution capabilities.
**Generative and Retrieval-Augmented Models
- Chain-of-Retrieval Augmented Generation: Presents a stepwise query and reasoning framework for retrieval-augmented generation.
- Can We Generate Images with CoT?: Integrates Chain-of-Thought reasoning for compositional and iterative image generation.
Multi-Modal and GUI Systems
- š UI-TARS: Pioneering Automated GUI Interaction: Advances vision-based agents for human-like GUI task performance.
- InternLM-XComposer2.5-Reward: Improves multi-modal reward modeling for text, image, and video alignment.
Robustness, Adaptability, and Uncertainty
- š Trading Inference-Time Compute for Adversarial Robustness: Examines inference-time compute scaling to improve robustness against adversarial attacks.
- Evolution and the Knightian Blindspot of Machine Learning: Advocates integrating evolutionary principles into machine learning for resilience to uncertainty.
Planning and Execution in AI
- š LLMs Can Plan Only If We Tell Them: Proposes structured state tracking to enhance planning capabilities in LLMs.
- Debate Helps Weak-to-Strong Generalization: Leverages debate methods to improve model generalization and alignment.
Social and Cognitive Insights
- Multiple Predictions of Othersā Actions in the Human Brain: Examines neural mechanisms for predicting social behaviors under ambiguity.
AI Infrastructure and Hardware
- š Good Things Come in Small Packages: Advocates Lite-GPUs for scalable and cost-effective AI infrastructure.
Thank you for reading! šØ If you want to receive our articles straight to your inbox, please subscribe here