Update README.md

5e2ff00 verified about 2 months ago

7.48 kB

	---
	license: gemma
	library_name: transformers
	pipeline_tag: image-text-to-text
	extra_gated_heading: Access Gemma on Hugging Face
	extra_gated_prompt: To access Gemma on Hugging Face, you’re required to review and
	agree to Google’s usage license. To do this, please ensure you’re logged in to Hugging
	Face and click below. Requests are processed immediately.
	extra_gated_button_content: Acknowledge license
	base_model: google/gemma-3n-E4B
	tags:
	- automatic-speech-recognition
	- automatic-speech-translation
	- audio-text-to-text
	- video-text-to-text
	- mlx
	---

	# NexaAI/gemma-3n-E4B-it-4bit-MLX

	## Quickstart

	Run them directly with [nexa-sdk](https://github.com/NexaAI/nexa-sdk) installed
	In nexa-sdk CLI:

	```bash
	NexaAI/gemma-3n-E4B-it-4bit-MLX
	```

	## Overview

	Summary description and brief definition of inputs and outputs.

	#### Description

	Gemma is a family of lightweight, state-of-the-art open models from Google,
	built from the same research and technology used to create the Gemini models.
	Gemma 3n models are designed for efficient execution on low-resource devices.
	They are capable of multimodal input, handling text, image, video, and audio
	input, and generating text outputs, with open weights for pre-trained and
	instruction-tuned variants. These models were trained with data in over 140
	spoken languages.

	Gemma 3n models use selective parameter activation technology to reduce resource
	requirements. This technique allows the models to operate at an effective size
	of 2B and 4B parameters, which is lower than the total number of parameters they
	contain. For more information on Gemma 3n's efficient parameter management
	technology, see the
	[Gemma 3n](https://ai.google.dev/gemma/docs/gemma-3n#parameters)
	page.

	#### Inputs and outputs

	- Input:
	- Text string, such as a question, a prompt, or a document to be
	summarized
	- Images, normalized to 256x256, 512x512, or 768x768 resolution
	and encoded to 256 tokens each
	- Audio data encoded to 6.25 tokens per second from a single channel
	- Total input context of 32K tokens
	- Output:
	- Generated text in response to the input, such as an answer to a
	question, analysis of image content, or a summary of a document
	- Total output length up to 32K tokens, subtracting the request
	input tokens

	## Benchmark Results

	These models were evaluated at full precision (float32) against a large
	collection of different datasets and metrics to cover different aspects of
	content generation. Evaluation results marked with IT are for
	instruction-tuned models. Evaluation results marked with PT are for
	pre-trained models.

	#### Reasoning and factuality

	\| Benchmark \| Metric \| n-shot \| E2B PT \| E4B PT \|
	\| ------------------------------ \|----------------\|----------\|:--------:\|:--------:\|
	\| [HellaSwag][hellaswag] \| Accuracy \| 10-shot \| 72.2 \| 78.6 \|
	\| [BoolQ][boolq] \| Accuracy \| 0-shot \| 76.4 \| 81.6 \|
	\| [PIQA][piqa] \| Accuracy \| 0-shot \| 78.9 \| 81.0 \|
	\| [SocialIQA][socialiqa] \| Accuracy \| 0-shot \| 48.8 \| 50.0 \|
	\| [TriviaQA][triviaqa] \| Accuracy \| 5-shot \| 60.8 \| 70.2 \|
	\| [Natural Questions][naturalq] \| Accuracy \| 5-shot \| 15.5 \| 20.9 \|
	\| [ARC-c][arc] \| Accuracy \| 25-shot \| 51.7 \| 61.6 \|
	\| [ARC-e][arc] \| Accuracy \| 0-shot \| 75.8 \| 81.6 \|
	\| [WinoGrande][winogrande] \| Accuracy \| 5-shot \| 66.8 \| 71.7 \|
	\| [BIG-Bench Hard][bbh] \| Accuracy \| few-shot \| 44.3 \| 52.9 \|
	\| [DROP][drop] \| Token F1 score \| 1-shot \| 53.9 \| 60.8 \|

	[hellaswag]: https://arxiv.org/abs/1905.07830
	[boolq]: https://arxiv.org/abs/1905.10044
	[piqa]: https://arxiv.org/abs/1911.11641
	[socialiqa]: https://arxiv.org/abs/1904.09728
	[triviaqa]: https://arxiv.org/abs/1705.03551
	[naturalq]: https://github.com/google-research-datasets/natural-questions
	[arc]: https://arxiv.org/abs/1911.01547
	[winogrande]: https://arxiv.org/abs/1907.10641
	[bbh]: https://paperswithcode.com/dataset/bbh
	[drop]: https://arxiv.org/abs/1903.00161

	#### Multilingual

	\| Benchmark \| Metric \| n-shot \| E2B IT \| E4B IT \|
	\| ------------------------------------\|-------------------------\|----------\|:--------:\|:--------:\|
	\| [MGSM][mgsm] \| Accuracy \| 0-shot \| 53.1 \| 60.7 \|
	\| [WMT24++][wmt24pp] (ChrF) \| Character-level F-score \| 0-shot \| 42.7 \| 50.1 \|
	\| [Include][include] \| Accuracy \| 0-shot \| 38.6 \| 57.2 \|
	\| [MMLU][mmlu] (ProX) \| Accuracy \| 0-shot \| 8.1 \| 19.9 \|
	\| [OpenAI MMLU][openai-mmlu] \| Accuracy \| 0-shot \| 22.3 \| 35.6 \|
	\| [Global-MMLU][global-mmlu] \| Accuracy \| 0-shot \| 55.1 \| 60.3 \|
	\| [ECLeKTic][eclektic] \| ECLeKTic score \| 0-shot \| 2.5 \| 1.9 \|

	[mgsm]: https://arxiv.org/abs/2210.03057
	[wmt24pp]: https://arxiv.org/abs/2502.12404v1
	[include]:https://arxiv.org/abs/2411.19799
	[mmlu]: https://arxiv.org/abs/2009.03300
	[openai-mmlu]: https://huggingface.co/datasets/openai/MMMLU
	[global-mmlu]: https://huggingface.co/datasets/CohereLabs/Global-MMLU
	[eclektic]: https://arxiv.org/abs/2502.21228

	#### STEM and code

	\| Benchmark \| Metric \| n-shot \| E2B IT \| E4B IT \|
	\| ------------------------------------\|--------------------------\|----------\|:--------:\|:--------:\|
	\| [GPQA][gpqa] Diamond \| RelaxedAccuracy/accuracy \| 0-shot \| 24.8 \| 23.7 \|
	\| [LiveCodeBench][lcb] v5 \| pass@1 \| 0-shot \| 18.6 \| 25.7 \|
	\| Codegolf v2.2 \| pass@1 \| 0-shot \| 11.0 \| 16.8 \|
	\| [AIME 2025][aime-2025] \| Accuracy \| 0-shot \| 6.7 \| 11.6 \|

	[gpqa]: https://arxiv.org/abs/2311.12022
	[lcb]: https://arxiv.org/abs/2403.07974
	[aime-2025]: https://www.vals.ai/benchmarks/aime-2025-05-09

	#### Additional benchmarks

	\| Benchmark \| Metric \| n-shot \| E2B IT \| E4B IT \|
	\| ------------------------------------ \|------------\|----------\|:--------:\|:--------:\|
	\| [MMLU][mmlu] \| Accuracy \| 0-shot \| 60.1 \| 64.9 \|
	\| [MBPP][mbpp] \| pass@1 \| 3-shot \| 56.6 \| 63.6 \|
	\| [HumanEval][humaneval] \| pass@1 \| 0-shot \| 66.5 \| 75.0 \|
	\| [LiveCodeBench][lcb] \| pass@1 \| 0-shot \| 13.2 \| 13.2 \|
	\| HiddenMath \| Accuracy \| 0-shot \| 27.7 \| 37.7 \|
	\| [Global-MMLU-Lite][global-mmlu-lite] \| Accuracy \| 0-shot \| 59.0 \| 64.5 \|
	\| [MMLU][mmlu] (Pro) \| Accuracy \| 0-shot \| 40.5 \| 50.6 \|

	[gpqa]: https://arxiv.org/abs/2311.12022
	[mbpp]: https://arxiv.org/abs/2108.07732
	[humaneval]: https://arxiv.org/abs/2107.03374
	[lcb]: https://arxiv.org/abs/2403.07974
	[global-mmlu-lite]: https://huggingface.co/datasets/CohereForAI/Global-MMLU-Lite

	## Reference
	Original model card: [google/gemma-3n-E4B-it](https://huggingface.co/google/gemma-3n-E4B-it)