NexaAI/gemma-3n-E4B-it-4bit-MLX

Quickstart

Run them directly with nexa-sdk installed In nexa-sdk CLI:

NexaAI/gemma-3n-E4B-it-4bit-MLX

Overview

Summary description and brief definition of inputs and outputs.

Description

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input, handling text, image, video, and audio input, and generating text outputs, with open weights for pre-trained and instruction-tuned variants. These models were trained with data in over 140 spoken languages.

Gemma 3n models use selective parameter activation technology to reduce resource requirements. This technique allows the models to operate at an effective size of 2B and 4B parameters, which is lower than the total number of parameters they contain. For more information on Gemma 3n's efficient parameter management technology, see the Gemma 3n page.

Inputs and outputs

  • Input:
    • Text string, such as a question, a prompt, or a document to be summarized
    • Images, normalized to 256x256, 512x512, or 768x768 resolution and encoded to 256 tokens each
    • Audio data encoded to 6.25 tokens per second from a single channel
    • Total input context of 32K tokens
  • Output:
    • Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document
    • Total output length up to 32K tokens, subtracting the request input tokens

Benchmark Results

These models were evaluated at full precision (float32) against a large collection of different datasets and metrics to cover different aspects of content generation. Evaluation results marked with IT are for instruction-tuned models. Evaluation results marked with PT are for pre-trained models.

Reasoning and factuality

Benchmark Metric n-shot E2B PT E4B PT
HellaSwag Accuracy 10-shot 72.2 78.6
BoolQ Accuracy 0-shot 76.4 81.6
PIQA Accuracy 0-shot 78.9 81.0
SocialIQA Accuracy 0-shot 48.8 50.0
TriviaQA Accuracy 5-shot 60.8 70.2
Natural Questions Accuracy 5-shot 15.5 20.9
ARC-c Accuracy 25-shot 51.7 61.6
ARC-e Accuracy 0-shot 75.8 81.6
WinoGrande Accuracy 5-shot 66.8 71.7
BIG-Bench Hard Accuracy few-shot 44.3 52.9
DROP Token F1 score 1-shot 53.9 60.8

Multilingual

Benchmark Metric n-shot E2B IT E4B IT
MGSM Accuracy 0-shot 53.1 60.7
WMT24++ (ChrF) Character-level F-score 0-shot 42.7 50.1
Include Accuracy 0-shot 38.6 57.2
MMLU (ProX) Accuracy 0-shot 8.1 19.9
OpenAI MMLU Accuracy 0-shot 22.3 35.6
Global-MMLU Accuracy 0-shot 55.1 60.3
ECLeKTic ECLeKTic score 0-shot 2.5 1.9

STEM and code

Benchmark Metric n-shot E2B IT E4B IT
GPQA Diamond RelaxedAccuracy/accuracy 0-shot 24.8 23.7
LiveCodeBench v5 pass@1 0-shot 18.6 25.7
Codegolf v2.2 pass@1 0-shot 11.0 16.8
AIME 2025 Accuracy 0-shot 6.7 11.6

Additional benchmarks

Benchmark Metric n-shot E2B IT E4B IT
MMLU Accuracy 0-shot 60.1 64.9
MBPP pass@1 3-shot 56.6 63.6
HumanEval pass@1 0-shot 66.5 75.0
LiveCodeBench pass@1 0-shot 13.2 13.2
HiddenMath Accuracy 0-shot 27.7 37.7
Global-MMLU-Lite Accuracy 0-shot 59.0 64.5
MMLU (Pro) Accuracy 0-shot 40.5 50.6

Reference

Original model card: google/gemma-3n-E4B-it

Downloads last month
128
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NexaAI/gemma-3n-E4B-it-4bit-MLX

Quantized
(18)
this model

Collection including NexaAI/gemma-3n-E4B-it-4bit-MLX