šŸŒ#85: Curiosity, Open Source, and Timing: The Formula Behind DeepSeekā€™s Phenomenal Success

Community Article Published January 27, 2025

How an open-source mindset, relentless curiosity, and strategic calculation are rewriting the rules in AI and challenging Western companies, plus an excellent reading list and curated research collection


šŸ”³ Turing Post is on šŸ¤— Hugging Face as a resident -> click to follow!


When we first covered DeepSeek models in August 2024 (we are opening that article for everyone, do read it), it didnā€™t gain much traction. That surprised me! Back then, DeepSeek was already one of the most exciting examples of curiosity-driven research in AI, committed to open-sourcing its discoveries. They also employed an intriguing approach: unlike many others racing to beat benchmarks, DeepSeek pivoted to addressing specific challenges, fostering innovation that extended beyond conventional metrics. Even then, they demonstrated significant cost reductions.

ā€œWhatā€™s behind DeepSeek-Coder-V2 that makes it so special it outperforms GPT-4 Turbo, Claude-3 Opus, Gemini 1.5 Pro, Llama 3-70B, and Codestral in coding and math?

DeepSeek-Coder-V2, costing 20ā€“50x less than other models, represents a major upgrade over the original DeepSeek-Coder. It features more extensive training data, larger and more efficient models, improved context handling, and advanced techniques like Fill-In-The-Middle and Reinforcement Learning.ā€ (Inside DeepSeek Models)

Although DeepSeek was making waves in the research community, it remained largely unnoticed by the broader public. But then they released R1-Zero and R1.

image/png

With that release they crushed industry benchmarks and disrupted the market by training their models at a fraction of the typical cost. But do you know what else they did? Not only did they prove that reinforcement learning (RL) is all you need in reasoning (R1 stands as solid proof of how well RL works), but they also embraced a trial-and-error approach ā€“ fundamental to RL ā€“ for their own business strategies. Previously overlooked, they calculated this release of R1 meticulously. Did you catch the timing? It was a strategic earthquake that shook the market and left everyone reeling:

  1. As ChinaTalk noticed: ā€œR1's release during President Trumpā€™s inauguration last week was clearly intended to rattle public confidence in the United Statesā€™ AI leadership at a pivotal moment in US policy, mirroring Huawei's product launch during former Secretary Raimondo's China visit. After all, the benchmark results of an R1 preview had already been public since November.ā€
  2. The release happened just one week before the Chinese Lunar New Year (this year on January 29), which typically lasts 15 days. However, the week leading up to the holiday is often quiet, giving them a perfect window to outshine other Chinese companies and maximize their PR impact.

So, while the DeepSeek family of models serves as a case study in the power of open-source development paired with relentless curiosity (from an interview with Liang Wenfeng, DeepSeekā€™s CEO: ā€œMany might think there's an undisclosed business logic behind this, but in reality, it's primarily driven by curiosity.ā€), itā€™s also an example of cold-blooded calculation and triumph of reinforcement learning applied to both models and humans :). DeepSeek has shown a deep understanding of how to play Western games and excel at them. Of course, todayā€™s market downturn, though concerning to many, will likely recover soon. However, if DeepSeek can achieve such outstanding results, Western companies need to reassess their strategies quickly and clarify their actual competitive moats.

Worries about NVIDIA

Of course, weā€™ll still need a lot of compute ā€“ everyone is hungry for it. Thatā€™s a quote from Liang Wenfeng, DeepSeekā€™s CEO: ā€œFor researchers, the thirst for computational power is insatiable. After conducting small-scale experiments, there's always a desire to conduct larger ones. Since then, we've consciously deployed as much computational power as possible.ā€

So, letā€™s not count NVIDIA out. What we can count on is Jensen Huangā€™s knack for staying ahead to find the way to stay relevant (NVIDIA wasnā€™t started as an AI company, if you remember). But what the rise of innovators like DeepSeek could push NVIDIA to is to double down on openness. Beyond the technical benefits, an aggressive push toward open-sourcing could serve as a powerful PR boost, reinforcing Nvidiaā€™s centrality in the ever-expanding AI ecosystem.

As I was writing these words about NVIDIA, they sent a statement regarding DeepSeek: ā€œDeepSeek is an excellent AI advancement and a perfect example of Test Time Scaling. DeepSeekā€™s work illustrates how new models can be created using that technique, leveraging widely-available models and compute that is fully export control compliant. Inference requires significant numbers of NVIDIA GPUs and high-performance networking. We now have three scaling laws: pre-training and post-training, which continue, and new test-time scaling.ā€

So ā€“ to wrap up ā€“ the main takeaway from DeepSeek breakthrough is that:

  • open-source and decentralize
  • stay curiosity-driven
  • apply reinforcement learning to everything

For DeepSeek, this is just the beginning. As curiosity continues to drive its efforts, it has proven that breakthroughs come not from hoarding innovation but from sharing it. As we move forward, itā€™s these principles that will shape the future of AI.

image/png

Clem's Twitter

We are reading (itā€™s all about šŸ³)

Here is a collection of superb articles covering everything you need to know about DeepSeek:

image/png

And yes, I agree with merve:

image/png

Curated Collections

7 Open-source Methods to Improve Video Generation and Understanding

Weekly recommendation from AI practitioneršŸ‘šŸ¼

To run DeepSeek models offline using LM Studio:

  • Install LM Studio: Download the appropriate version for your operating system from the LM Studio website. Follow the installation instructions provided.
  • Download the DeepSeek Model: Open LM Studio and navigate to the "Discover" tab. Search for "DeepSeek" and select your desired model. Click "Download" to save the model locally.
  • Run the Model Offline: Once downloaded, go to the "Local Models" section. Select the DeepSeek model and click "Load." You can interact with the model directly within LM Studio without an internet connection.

News from The Usual Suspects Ā©

  • Data Center News

    $500B Stargate AI Venture by OpenAI, Oracle, and SoftBank

    With plans to build massive data centers and energy facilities in Texas, Stargate aims to bolster U.S. AI dominance. Partners like NVIDIA and Microsoft bring muscle to this high-stakes competition with China. Trump supports it, Musk trashes.

    Meta's Manhattan-Sized AI Leap

    Mark Zuckerbergā€™s AI ambitions come on a smaller scale (haha) ā€“ $65 billion for a data center so vast it could envelop Manhattan. With 1.3 million GPUs powering this, Meta aims to revolutionize its ecosystem and rival Americaā€™s AI heavyweights. The era of AI megaprojects is here.

  • Mistralā€™s IPO Plans: Vive la RĆ©sistance French AI startup Mistral isnā€™t selling out. With ā‚¬1 billion raised, CEO Arthur Mensch eyes an IPO while doubling down on open-source LLMs. Positioned as a European powerhouse, Mistralā€™s independence signals Europeā€™s readiness to play hardball in the global AI race.

  • SmolVLM: Hugging Face Goes Tiny Hugging Face introduces SmolVLM, two of the smallest foundation models yet. This open-source release proves size doesnā€™t matter when efficiency leads the charge, setting new standards for compact AI development.

  • OpenAI's Agent Takes the Wheel CUA (Computer-Using Agent) redefines multitasking with Operator, seamlessly interacting with GUIs like a digital power user. From downloading PDFs to complex web tasks, itā€™s the closest weā€™ve come to a universal assistant .CUA is now in Operator's research preview for Pro users. Blog. System Card.

  • Google DeepMind A Year in Geminiā€™s Orbit They just published an overview of 2024. From Gemini 2.0's breakthroughs in multimodal AI to Willow chipā€™s quantum strides, innovation soared. Med-Gemini aced medical exams, AlphaFold 3 advanced molecular science, and ALOHA redefined robotics. With disaster readiness, educational tools, and responsible AI initiatives, DeepMind balanced cutting-edge tech with global impact. A Nobel-worthy streak indeed. Cost-Cutting AI with "Light Chips" Demis Hassabis unveils Google's next move ā€“ custom "light chips" designed to slash AI model costs while boosting efficiency. These chips power Gemini 2.0 Flash, with multimodal AI, 1M-token memory, and a "world model" vision for AGI. DeepMindā€™s edge? Owning every layer of the AI stack, from chips to algorithms.

Top models to pay attention to

  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Enhance reasoning in LLMs with multi-stage reinforcement learning, outperforming competitors in benchmarks like AIME 2024 and MATH-500.
  • Kimi K1.5: Scaling Reinforcement Learning with LLMs Scale reasoning capabilities with efficient reinforcement learning methods, optimizing token usage for both long- and short-chain-of-thought tasks.
  • VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Advance image and video understanding with multimodal integration, achieving top results in temporal reasoning and long-video tasks.
  • Qwen2.5-1M Series Support 1M-token contexts with open-source models, leveraging sparse attention and lightning-fast inference frameworks for long-context tasks.

The freshest research papers, categorized for your convenience

There were quite a few TOP research papers this week, we will mark them with šŸŒŸ in each section.

Specialized Architectures and Techniques

  • šŸŒŸ Demons in the Detail: Introduces load-balancing loss for training Mixture-of-Experts models.
  • šŸŒŸ Autonomy-of-Experts Models: Proposes expert self-selection to improve Mixture-of-Experts efficiency and scalability.
  • O1-Pruner: Length-Harmonizing Fine-Tuning: Reduces inference overhead in reasoning models through reinforcement learning-based pruning. Language Model Reasoning and Decision-Making
  • šŸŒŸ Evolving Deeper LLM Thinking: Explores genetic search methods to enhance natural language inference for planning tasks, achieving superior accuracy.
  • šŸŒŸ Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training: Develops a framework for LLMs to self-correct using Monte Carlo Tree Search and iterative refinement.
  • šŸŒŸ Reasoning Language Models: A Blueprint: Proposes a modular framework integrating reasoning methods to democratize reasoning capabilities.
  • Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback: Enhances mathematical reasoning with stepwise binary feedback for more accurate LLM outputs.
  • Test-Time Preference Optimization: Introduces a framework for aligning LLM outputs to human preferences during inference without retraining.

Multi-Agent Systems and Coordination

**Generative and Retrieval-Augmented Models

Multi-Modal and GUI Systems

Robustness, Adaptability, and Uncertainty

Planning and Execution in AI

Social and Cognitive Insights

AI Infrastructure and Hardware


Thank you for reading! šŸ“Ø If you want to receive our articles straight to your inbox, please subscribe here


Community

Sign up or log in to comment