---
pipeline_tag: text-generation
base_model: nvidia/Llama-3.1-8B-UltraLong-4M-Instruct
base_model_relation: quantized
tags:
- chat
- 4bit
- apple
- long-context
license: cc-by-nc-4.0
language:
- en
- fr
- es
- de
- it
- hi
- ru
library_name: mlx
---
# Llama 3.1 8B UltraLong 4M Instruct 4-bit MLX
MLX version of **Llama 3.1 8B UltraLong 4M Instruct**

This model was converted to MLX format from [`nvidia/Llama-3.1-8B-UltraLong-4M-Instruct`](https://huggingface.co/nvidia/Llama-3.1-8B-UltraLong-4M-Instruct) using mlx-lm version **0.22.5**.

## Model Details

Maximum context window: 4M tokens


For more details, please refer to [arXiv](https://arxiv.org/abs/2504.06214).


## Use with mlx

```bash
pip install -U mlx-lm
```

```bash
python -m mlx_lm.generate --model TheCluster/Llama-3.1-8B-UltraLong-4M-Instruct-mlx-4bit --max-tokens 65536 --temperature 0.5 --prompt "Your big prompt"
```