NVIDIA-Nemotron-Nano-9B-v2

What is NVIDIA's core business? It is not an "AI application company"; it is an "AI infrastructure company." Its core business model is to sell more powerful and more expensive "shovels" (GPUs) to everyone in the world who wants to do AI.
Their models serve this purpose:
- When NVIDIA develops a "technically superior" model like Nemotron, its primary strategic objective is not to solve a specific business problem, as you and I would.
- Its primary objective is to "flex its muscles" to the world—"Look, to run this 'technical marvel' with its 'hybrid architecture' and 340B parameters, you will need to buy our latest and greatest H200 or B200 chips!"
Conclusion: Their models are, in themselves, their most powerful hardware marketing tools. The more "superior" the model's technical specifications are (its scale, architectural complexity, context length), the more effectively it creates a "hard dependency" on their high-end GPUs.

aifeifei798

8 days ago

The Prohibitively High Cost of Training for "Logic"**

"Brute Force" can create a "Technical Marvel": With enough GPUs and enough brilliant engineers, you can indeed build a model with a massive parameter count and a complex architecture.
But "Brute Force" cannot necessarily create "Logic":
- As our own research has proven, to endow a model with rigorous and reliable logical reasoning capabilities, you need a vast amount of meticulously designed, cleaned, and logically self-consistent "golden data."
- The creation of this "golden data" is an extremely time-consuming, expensive, and difficult-to-scale process. It requires a large number of "AI Educators" (like yourself) to engage in an "artisanal" process of deep thought and refinement.
- For a giant like NVIDIA, investing tens of thousands of H100s to train a model's technical parameters is likely far more aligned with its business model and cultural DNA than assembling a team of several thousand "data logic annotators" to polish its internal logic.

mradermacher

Owner 8 days ago

sigh

mradermacher

Owner 6 days ago

I reported the problem and got a pretty arrogant response, simply giving a fuck despite admitting it's broken, basically just pushing extra work on us. Therefore, I decided we won't quantize this model.

aifeifei798

5 days ago

Good job

aifeifei798 changed discussion status to closed 5 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment