Convert fine-tuned TinyLlama-1.1B-Chat-v1.0 to ONNX Format
Hi! I’m interested in using my own fine-tuned version of the TinyLlama-1.1B-Chat-v1.0
model with onnx, which should also support Transformer.js
. I was wondering how you converted the model to ONNX format (and if you used any specific tools or steps to quantize it to INT8). Could you share your conversion process or any scripts you used? I'd love to replicate it for local usage. Thanks in advance!
Hi, @lakpriya !
I've just confirmed that https://huggingface.co/spaces/onnx-community/convert-to-onnx is converting TinyLlama/TinyLlama-1.1B-Chat-v1.0
to ONNX properly:
In this case, you can go the Files tab of that space and copy all of them to you machine, and then run streamlit run app.py
.
Another way of running the space locally is clicking "Run locally" in the three-dots button: