Model Overview
- Model Architecture: Bert
- Input: Text
- Output: Text
- Model Optimizations:
- Maximum Context Length: 384 tokens
- Intended Use Cases: Intended for commercial and non-commercial use. Same as google/bert-large-uncase, this models is intended for question-answering.
- Release Date: 04/12/2025
- Version: v2025.2
- License(s): Apache License 2.0
- Supported Inference Engine(s): Furiosa LLM
- Supported Hardware Compatibility: FuriosaAI RNGD
- Preferred Operating System(s): Linux
- Quantization:
- Tool: Furiosa Model Compressor v0.6.2, included in Furiosa SDK 2025.2
- Weight: int8, Activation: int8, KV cache: int8
- Calibration: SQuAD v1.1 dataset (instruction), 100 samples
Description:
This model is the pre-compiled version of the google/bert-large-uncase, which is an embedding model that uses an optimized transformer architecture.
Usage
MLPerf Benchmark using RNGD
Follow the example command below after installing furiosa-mlperf and its prerequisites.
furiosa-mlperf bert-offline furiosa-ai/bert-large-uncased-INT8-MLPerf ./mlperf-result
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
馃檵
Ask for provider support
Model tree for furiosa-ai/bert-large-uncased-INT8-MLPerf
Base model
google-bert/bert-large-uncased