AI & ML interests

None defined yet.

Recent Activity

hyunsikc  updated a model about 10 hours ago
furiosa-ai/Qwen2.5-14B-Instruct
hyunsikc  updated a model about 10 hours ago
furiosa-ai/Qwen2.5-7B-Instruct
hyunsikc  updated a model about 10 hours ago
furiosa-ai/Qwen2.5-Coder-7B-Instruct
View all activity

FuriosaAI develops data center AI accelerators. Our RNGD (pronounced “Renegade) accelerator, currently sampling, excels at high-performance inference for LLMs and agentic AI.

Get started fast with common inference tasks on RNGD using these pre-compiled popular Hugging Face models – no manual conversion or quantization needed. Requires Furiosa SDK 2025.2 or later on a server with RNGD accelerator.

Need a model with custom configurations? Compile it yourself using our Model Preparation Workflow on Furiosa Docs. Visit Supported Models in the SDK documentation for more information and learn more about RNGD at https://furiosa.ai/rngd.

Pre-compiled models

Please check out the collection of models at https://huggingface.co/furiosa-ai/collections.

Pre-compiled Model Description Base Model Support Version
furiosa-ai/bert-large-uncased-INT8-MLPerf INT8 quantized, optimized for MLPerf google-bert/bert-large-uncased 2025.2
furiosa-ai/gpt-j-6b-FP8-MLPerf FP8 quantized, optimized for MLPerf EleutherAI/gpt-j-6b 2025.2
furiosa-ai/DeepSeek-R1-Distill-Llama-8B BF16 deepseek-ai/DeepSeek-R1-Distill-Llama-8B >= 2025.3
furiosa-ai/DeepSeek-R1-Distill-Llama-70B BF16 deepseek-ai/DeepSeek-R1-Distill-Llama-70B >= 2025.3
furiosa-ai/DeepSeek-R1-Distill-Qwen-7B BF16 deepseek-ai/DeepSeek-R1-Distill-Qwen-7B >= 2025.3
furiosa-ai/DeepSeek-R1-Distill-Qwen-14B BF16 deepseek-ai/DeepSeek-R1-Distill-Qwen-14B >= 2025.3
furiosa-ai/DeepSeek-R1-Distill-Qwen-32B BF16 deepseek-ai/DeepSeek-R1-Distill-Qwen-32B >= 2025.3
furiosa-ai/EXAONE-3.5-7.8B-Instruct BF16 LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct >= 2025.2
furiosa-ai/EXAONE-3.5-32B-Instruct BF16 LGAI-EXAONE/EXAONE-3.5-32B-Instruct >= 2025.2
furiosa-ai/Llama-3.1-8B-Instruct BF16 meta-llama/Llama-3.1-8B-Instruct >= 2025.2
furiosa-ai/Llama-3.1-8B-Instruct-FP8 FP8 quantized meta-llama/Llama-3.1-8B-Instruct >= 2025.2
furiosa-ai/Llama-3.3-70B-Instruct BF16 meta-llama/Llama-3.3-70B-Instruct >= 2025.3
furiosa-ai/Llama-3.3-70B-Instruct-INT8 INT8 weight quantization meta-llama/Llama-3.3-70B-Instruct >= 2025.3
furiosa-ai/Qwen2.5-7B-Instruct BF16 Qwen/Qwen2.5-7B-Instruct >= 2025.3
furiosa-ai/Qwen2.5-14B-Instruct BF16 Qwen/Qwen2.5-14B-Instruct >= 2025.3
furiosa-ai/Qwen2.5-32B-Instruct BF16 Qwen/Qwen2.5-32B-Instruct >= 2025.3
furiosa-ai/Qwen2.5-Coder-7B-Instruct BF16 Qwen/Qwen2.5-Coder-7B-Instruct >= 2025.3
furiosa-ai/Qwen2.5-Coder-14B-Instruct BF16 Qwen/Qwen2.5-Coder-14B-Instruct >= 2025.3
furiosa-ai/Qwen2.5-Coder-32B-Instruct BF16 Qwen/Qwen2.5-Coder-32B-Instruct >= 2025.3

Examples

First, install the pre-requisites by following Installing Furiosa-LLM.

Then, run the following command to start the Furiosa-LLM server with the Llama-3.1-8B-Instruct-FP8 model:

furiosa-llm serve furiosa-ai/Llama-3.1-8B-Instruct-FP8

For reasoning models like DeepSeek-R1-Distill-Llama-8B, you can enable the reasoning mode with a proper reasoning parser:

furiosa-llm serve furiosa-ai/DeepSeek-R1-Distill-Llama-8B \
  --enable-reasoning --reasoning-parser deepseek_r1

Once your server has launched, you can query the model with input prompts:

curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "EMPTY",
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
    }' \
    | python -m json.tool

You can also learn more about usages from Quick Start with Furiosa-LLM.

datasets 0

None public yet