Add/update the quantized ONNX model files and README.md for Transformers.js v3
Applied Quantizations
✅ Based on decoder_model.onnx
with slimming
↳ ✅ fp16
: decoder_model_fp16.onnx
(added)
↳ ✅ int8
: decoder_model_int8.onnx
(added)
↳ ✅ uint8
: decoder_model_uint8.onnx
(added)
↳ ✅ q4
: decoder_model_q4.onnx
(added)
↳ ✅ q4f16
: decoder_model_q4f16.onnx
(added)
↳ ✅ bnb4
: decoder_model_bnb4.onnx
(added)
✅ Based on decoder_model.onnx
with slimming
↳ ✅ fp16
: decoder_model_fp16.onnx
(added)
↳ ✅ int8
: decoder_model_int8.onnx
(added)
↳ ✅ uint8
: decoder_model_uint8.onnx
(added)
↳ ✅ q4
: decoder_model_q4.onnx
(added)
↳ ✅ q4f16
: decoder_model_q4f16.onnx
(added)
↳ ✅ bnb4
: decoder_model_bnb4.onnx
(added)
✅ Based on encoder_model.onnx
with slimming
↳ ✅ int8
: encoder_model_int8.onnx
(added)
↳ ✅ uint8
: encoder_model_uint8.onnx
(added)
↳ ✅ q4
: encoder_model_q4.onnx
(added)
↳ ✅ q4f16
: encoder_model_q4f16.onnx
(added)
↳ ✅ bnb4
: encoder_model_bnb4.onnx
(added)
✅ Based on encoder_model.onnx
with slimming
↳ ✅ int8
: encoder_model_int8.onnx
(added)
↳ ✅ uint8
: encoder_model_uint8.onnx
(added)
↳ ✅ q4
: encoder_model_q4.onnx
(added)
↳ ✅ q4f16
: encoder_model_q4f16.onnx
(added)
↳ ✅ bnb4
: encoder_model_bnb4.onnx
(added)
✅ Based on decoder_with_past_model.onnx
with slimming
↳ ✅ fp16
: decoder_with_past_model_fp16.onnx
(added)
↳ ✅ int8
: decoder_with_past_model_int8.onnx
(added)
↳ ✅ uint8
: decoder_with_past_model_uint8.onnx
(added)
↳ ✅ q4
: decoder_with_past_model_q4.onnx
(added)
↳ ✅ q4f16
: decoder_with_past_model_q4f16.onnx
(added)
↳ ✅ bnb4
: decoder_with_past_model_bnb4.onnx
(added)
✅ Based on decoder_with_past_model.onnx
with slimming
↳ ✅ fp16
: decoder_with_past_model_fp16.onnx
(added)
↳ ✅ int8
: decoder_with_past_model_int8.onnx
(added)
↳ ✅ uint8
: decoder_with_past_model_uint8.onnx
(added)
↳ ✅ q4
: decoder_with_past_model_q4.onnx
(added)
↳ ✅ q4f16
: decoder_with_past_model_q4f16.onnx
(added)
↳ ✅ bnb4
: decoder_with_past_model_bnb4.onnx
(added)
❌ Based on decoder_model_merged.onnx
with slimming
0%| | 0/1 [00:00<?, ?it/s]
Processing /tmp/tmp9nbcmgpl/decoder_model_merged.onnx: 0%| | 0/1 [00:00<?, ?it/s]
0%| | 0/6 [00:00<?, ?it/s][A
- Quantizing to fp16: 0%| | 0/6 [00:00<?, ?it/s][A/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 5.697138316662631e-10 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -8.893208836013855e-09 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 8.164762732576492e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 2.8440178212463252e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -6.59005578995675e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:85: UserWarning: the float32 number -3.4028234663852886e+38 will be truncated to -10000.0
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 2.889956540741423e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -2.6373411898816812e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -7.866925244570666e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 6.710493494210823e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -6.309365119250288e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 2.5587501895074638e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -4.7406583547626724e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 4.947884235662059e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 5.595091501220395e-09 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -7.471919616364175e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 1.863419973879843e-10 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -1.5860015523117e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -9.811487444721934e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -3.936450809760572e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 1.9050668598197262e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -7.12030754357329e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 2.4172115420384443e-09 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 6.794098084128564e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -6.780636141456853e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.139340520874612e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.843179641194183e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 9.474867823655586e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -7.072906527127998e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 8.883689162075825e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -6.64872388256299e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -3.740089837833693e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -6.712274114306638e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 7.034567928165814e-10 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -6.457877788079713e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -2.0287941993046843e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -4.035070588770395e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 1.6361820343036015e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -4.2513253362130854e-08 will be truncated to -1e-07
warnings.warn(
- Quantizing to fp16: 0%| | 0/6 [00:04<?, ?it/s]
Processing /tmp/tmp9nbcmgpl/decoder_model_merged.onnx: 0%| | 0/1 [00:04<?, ?it/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 377, in <module>
main()
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 374, in main
quantize(input_folder, output_folder, quantization_args)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 309, in quantize
quantize_fp16(
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 223, in quantize_fp16
check_and_save_model(model_fp16, save_path)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 29, in check_and_save_model
strict_check_model(model)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 21, in strict_check_model
raise e
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 16, in strict_check_model
onnx.checker.check_model(model_or_path, full_check=True)
File "/home/ubuntu/.cache/uv/archive-v0/7hYcxZ8pwavXeKpAYRaHY/lib/python3.12/site-packages/onnx/checker.py", line 179, in check_model
C.check_model(
onnx.onnx_cpp2py_export.shape_inference.InferenceError: [ShapeInferenceError] Inference error(s): (op_type:If, node name: optimum::if): [ShapeInferenceError] Inference error(s): (op_type:Add, node name: /model/decoder/embed_positions/Add): [ShapeInferenceError] Inferred shape and existing shape differ in rank: (1) vs (0)
✅ Based on decoder_model_merged.onnx
without slimming
↳ ✅ fp16
: decoder_model_merged_fp16.onnx
(replaced because it was invalid)
↳ ✅ int8
: decoder_model_merged_int8.onnx
(added)
↳ ✅ uint8
: decoder_model_merged_uint8.onnx
(added)
↳ ✅ q4
: decoder_model_merged_q4.onnx
(added)
↳ ✅ q4f16
: decoder_model_merged_q4f16.onnx
(added)
↳ ✅ bnb4
: decoder_model_merged_bnb4.onnx
(added)