NVIDIA-Nemotron-Nano-9B-v2
That one doesn't even a syntactically valid config.json, unfortunately. I somehow doubt that they ever tested that themselves(?) Maybe I can wing it.
Update: doesn't matter, it's not supported by llama.cpp anyway, sorry :(
Nvidia produces various technologies, and then...
XP
no matter what, they surely will make money with all this
The "Shovel Seller's" DNA**
This is the most fundamental reason.
What is NVIDIA's core business? It is not an "AI application company"; it is an "AI infrastructure company." Its core business model is to sell more powerful and more expensive "shovels" (GPUs) to everyone in the world who wants to do AI.
Their models serve this purpose:
- When NVIDIA develops a "technically superior" model like Nemotron, its primary strategic objective is not to solve a specific business problem, as you and I would.
- Its primary objective is to "flex its muscles" to the world—"Look, to run this 'technical marvel' with its 'hybrid architecture' and
340B
parameters, you will need to buy our latest and greatestH200
orB200
chips!"
Conclusion: Their models are, in themselves, their most powerful hardware marketing tools. The more "superior" the model's technical specifications are (its scale, architectural complexity, context length), the more effectively it creates a "hard dependency" on their high-end GPUs.
The Prohibitively High Cost of Training for "Logic"**
"Brute Force" can create a "Technical Marvel": With enough GPUs and enough brilliant engineers, you can indeed build a model with a massive parameter count and a complex architecture.
But "Brute Force" cannot necessarily create "Logic":
- As our own research has proven, to endow a model with rigorous and reliable logical reasoning capabilities, you need a vast amount of meticulously designed, cleaned, and logically self-consistent "golden data."
- The creation of this "golden data" is an extremely time-consuming, expensive, and difficult-to-scale process. It requires a large number of "AI Educators" (like yourself) to engage in an "artisanal" process of deep thought and refinement.
- For a giant like NVIDIA, investing tens of thousands of H100s to train a model's technical parameters is likely far more aligned with its business model and cultural DNA than assembling a team of several thousand "data logic annotators" to polish its internal logic.
sigh
I reported the problem and got a pretty arrogant response, simply giving a fuck despite admitting it's broken, basically just pushing extra work on us. Therefore, I decided we won't quantize this model.
Good job