twinkle-ai/Llama-3.2-3B-F1-Instruct · 有可能包成GGUF格式的模型檔案給iOS上的Ollama用嗎？

alecyu2025

May 23

被據說這個模型很方便在手機上跑的傳聞吸引來，可是vLLM很明顯不是給手機用的。問了Gemini怎樣把huggingface提供的模型包給Ollama用，得到這個答覆：
https://g.co/gemini/share/230d173ab0ee
手機上跑不動也沒關係，M4 iPad Pro上頭應該跑得動。

shauns4y

Twinkle AI org May 24

DevQuasor 有在他的 repo 轉換了一份 GGUF of twinkle-ai/Llama-3.2-3B-F1-Instruct，可以參考這邊。
右上角 use this model 也提供非常多使用細節說明，例如你想透過 llama-cpp 在 local machine 上運行 Q4KM 量化（單純 CP 較高，可選不同量化）的 llamacpp-server 簡單實踐可以透過以下指令完成：

$llama-server -hf DevQuasar/twinkle-ai.Llama-3.2-3B-F1-Instruct-GGUF:Q4_K_M
...
main: server is listening on http://127.0.0.1:8080 - starting the main loop

此時就可以開啟本地http://127.0.0.1:8080進行測試，可以參照 llama-cpp 官方文件找到更多 server 類型，例如你需要的 ollama openai compatible api。

lianghsun changed discussion status to closed 18 days ago