Namo-500M-V1 / README.md
lucasjin's picture
Update README.md
24a6620 verified
metadata
license: mit
datasets:
  - open-r1/OpenR1-Math-220k

Namo R1

πŸ€— Namo-500M-V1   |   πŸ Community

You: I don't have GPUs to run VLMs. Namo R1: Hold my beer.... let's do this on CPU.

Namo R1 πŸ”₯πŸ”₯ surpassed SmolVLM and Moondream2 in terms of same size! And we are keep evolving, more advanced models are under training!

Introduction

We are excited to open-source Namo, an extremly small yet mighty MLLM. While numerous MLLMs exist, few offer true extensibility or fully open-source their training data, model architectures, and training schedulers - critical components for reproducible AI research.

The AI community has largely overlooked the potential of compact MLLMs, despite their demonstrated efficiency advantages. Our analysis reveals significant untapped potential in sub-billion parameter models, particularly for edge deployment and specialized applications. To address this gap, we're releasing Namo R1, a foundational 500M parameter model trained from scratch using innovative architectural choices.

Key innovations include:

  1. CPU friendly: Even on CPUs, Namo R1 can runs very fast;
  2. Omni-modal Scalability: Native support for future expansion into audio (ASR/TTS) and cross-modal fusion;
  3. Training Transparency: Full disclosure of data curation processes and dynamic curriculum scheduling techniques.

πŸ‘‡ Video Demo Runs on CPU:

Github: https://github.com/lucasjinreal/Namo-R1/

Updates

  • 2025.02.21: more to come...!
  • 2025.02.21: πŸ”₯πŸ”₯ The first version is ready to open, fire the MLLM power able to runs on CPU!
  • 2025.02.17: Namo R1 start training.

Results

the result might keep updating as new models trained.

Model MMB-EN-T MMB-CN-T Size
Namo-500M 68.8 48.7 500M
Namo-700M training training 700M
Namo-500M-R1 training training 500M
Namo-700M-R1 training training 700M
SmolVLM-500M 53.8 35.4 500M
SmolVLM-Instruct-DPO 67.5 49.8 2.3B
Moondream1 62.3 19.8 1.9B
Moondream2 70 28.7 1.9B

⚠️ Currently, the testing has only been conducted on a limited number of benchmarks. In the near future, more metrics will be reported. Even so, we've observed significant improvements compared to other small models.

Get Started

Install & Run in Cli

All you need to do is:

pip install -U namo

A simple demo would be:

from namo.api.vl import VLInfer

# model will download automatically
model = VLInfer(model_type='namo')

# default will have streaming
model.generate('what is this?', 'images/cats.jpg', stream=True)

That's all!

For cli multi-turn chat in terminal you can run python demo.py. (Namo cli directly in your terminal would be avaiable later.)

OpenAI server & Run in OpenWebUI

namo server --model checkpoints/Namo-500M-V1

then, you will have OpenAI like serving in local.

Features of Namo R1

In contrast to open-source VLMs like Qwen2.5-3B and MiniCPM, the Namo series offers the following features that enable anyone to train their own VLMs from scratch:

  • Extremely Small: Our first series has only 500 million parameters yet powerful on various tasks.
  • OCR Capability: With just a 500M model, you can perform multilingual OCR, covering not only Chinese and English but also Japanese and other languages.
  • Dynamic Resolution: We support native dynamic resolution as input, making it robust for images of any ratio.
  • Fully Open Source: We opensource all model codes including training steps and scripts!
  • R1 Support: Yes, we now support R1 for post-training.

Above all, we are also ready to help when u want train your MLLM from scratch at any tasks!

Roadmap

We are still actively training on new models, here are few things we will arrive:

  • Speech model;
  • Vision model with more decent vision encoders, such as SigLip2;
  • TTS ability;
  • Slightly larger models, up to 7B;

Trouble Shooting

  1. Got error when using deepspeed: AssertionError: no_sync context manager is incompatible with gradient partitioning logic of ZeRO stage 2 ?

Please upgrade transformers to 4.48+ and use latest deepspeed.

Copyright

All right reserved by Namo authors, code released under MIT License.