--- pipeline_tag: text-generation base_model: nvidia/Llama-3.1-8B-UltraLong-4M-Instruct base_model_relation: quantized tags: - chat - 4bit - apple - long-context license: cc-by-nc-4.0 language: - en - fr - es - de - it - hi - ru library_name: mlx --- # Llama 3.1 8B UltraLong 4M Instruct 4-bit MLX MLX version of **Llama 3.1 8B UltraLong 4M Instruct** This model was converted to MLX format from [`nvidia/Llama-3.1-8B-UltraLong-4M-Instruct`](https://huggingface.co/nvidia/Llama-3.1-8B-UltraLong-4M-Instruct) using mlx-lm version **0.22.5**. ## Model Details Maximum context window: 4M tokens For more details, please refer to [arXiv](https://arxiv.org/abs/2504.06214). ## Use with mlx ```bash pip install -U mlx-lm ``` ```bash python -m mlx_lm.generate --model TheCluster/Llama-3.1-8B-UltraLong-4M-Instruct-mlx-4bit --max-tokens 65536 --temperature 0.5 --prompt "Your big prompt" ```