openbmb
/

MiniCPM-o-2_6

Model card Files Files and versions Community

Could you share more information about the on device demo?

#11

by mikv39 - opened 3 days ago

3 days ago

Your demo looks impressive! To achieve the performance shown in your on-device demo, it seems that a processing speed of at least one frame per second is required.
However, real-time streaming input at a high frame rate on the iPad (even with the M4 chip) seems unlikely. According to the performance data of llama.cpp(https://github.com/ggerganov/llama.cpp/discussions/4167), llama.cpp on M4 chip achieves only 230.18 tokens/s.

Is your demo running entirely on the iPad? If so, could you share which backend you're using in your on device demo (MLC, llama.cpp, GPU, CPU, or a proprietary non-open-source solution)? Also, what is the sampling frame rate for the streaming input in your demo video?

Thank you very much!

bigLei

3 days ago

你能分析出这是什么视频吗

bigLei

3 days ago

bigLei

3 days ago

上面这段视频拍的什么

bigLei

3 days ago

上面p这段视频在什么地方

mikv39 changed discussion status to closed 2 days ago

tc-mb

OpenBMB org 2 days ago

Yes, the code shown in our video is running completely on iPad in flight mode. The frame rate of the video is one frame per second.
As you said, the original llama.cpp running on iPad cannot achieve the effect in the video. Our team has deeply modified the code of llama.cpp in the past few months and let different modules run on npu/gpu/cpu at the same time to achieve the effect in the video.
I will submit pr to the official of llama.cpp in the future so that everyone can experience the omni mode for themselves.

mikv39

2 days ago

Yes, the code shown in our video is running completely on iPad in flight mode. The frame rate of the video is one frame per second.
As you said, the original llama.cpp running on iPad cannot achieve the effect in the video. Our team has deeply modified the code of llama.cpp in the past few months and let different modules run on npu/gpu/cpu at the same time to achieve the effect in the video.
I will submit pr to the official of llama.cpp in the future so that everyone can experience the omni mode for themselves.

Thanks a lot, currently llama.cpp/ggml dose not have on device NPU backends support. Hope everyone can benefits from your PR.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment