@Xenova on Hugging Face: "I can't believe this... Phi-3.5-mini (3.8B) running in-browser at ~90…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

Xenova

posted an update Aug 23, 2024

Post

14051

I can't believe this... Phi-3.5-mini (3.8B) running in-browser at ~90 tokens/second on WebGPU w/ Transformers.js and ONNX Runtime Web! 🤯 Since everything runs 100% locally, no messages are sent to a server — a huge win for privacy!
- 🤗 Demo: webml-community/phi-3.5-webgpu
- 🧑‍💻 Source code: https://github.com/huggingface/transformers.js-examples/tree/main/phi-3.5-webgpu

ZeroWw

Aug 23, 2024

why in the world should be cool to run something in a browser when it can run locally using llama.cpp ??

Ke09876

Aug 25, 2024

Why in the world would you bother installing llama.cpp when you can just open a webpage?

qnixsynapse

Aug 24, 2024

Depends upon the GPU hardware tbh. Not everywhere you can get 90 tokens/sec. :)

ceoofcapybaras

Aug 24, 2024

As a frontend dev, LLMs were not meant for the browsers. You have to download the weights every time you reload the page. It's impressive that they do run well in the browser, but I don't see any practical use cases.

Ke09876

Aug 25, 2024

You don't donwload the weights every time, they are usually stored in the OPFS or the IndexedDB.

gianpaj

Nov 4, 2024

i'm getting 11.24tokens/second on an MacBook M1 Pro

In this post