Model Overview

Model Architecture: Bert
- Input: Text
- Output: Text
Model Optimizations:
Maximum Context Length: 384 tokens
Intended Use Cases: Intended for commercial and non-commercial use. Same as google/bert-large-uncase, this models is intended for question-answering.
Release Date: 04/12/2025
Version: v2025.2
License(s): Apache License 2.0
Supported Inference Engine(s): Furiosa LLM
Supported Hardware Compatibility: FuriosaAI RNGD
Preferred Operating System(s): Linux
Quantization:
- Tool: Furiosa Model Compressor v0.6.2, included in Furiosa SDK 2025.2
- Weight: int8, Activation: int8, KV cache: int8
- Calibration: SQuAD v1.1 dataset (instruction), 100 samples

Description:

This model is the pre-compiled version of the google/bert-large-uncase, which is an embedding model that uses an optimized transformer architecture.

furiosa-mlperf bert-offline furiosa-ai/bert-large-uncased-INT8-MLPerf ./mlperf-result