OpenCodeReasoning-Nemotron-1.1-32B-q3f16_1-MLC

This is the OpenCodeReasoning-Nemotron-1.1-32B model in MLC format q3f16_1. The model can be used with MLC-LLM and WebLLM.

Example Usage

Before using the examples, please follow the installation guide.

Chat CLI

mlc_llm chat HF://JackBinary/OpenCodeReasoning-Nemotron-1.1-32B-q3f16_1-MLC

REST Server

mlc_llm serve HF://JackBinary/OpenCodeReasoning-Nemotron-1.1-32B-q3f16_1-MLC

Python API

from mlc_llm import MLCEngine

model = "HF://JackBinary/OpenCodeReasoning-Nemotron-1.1-32B-q3f16_1-MLC"
engine = MLCEngine(model)

for response in engine.chat.completions.create(
    messages=[{"role": "user", "content": "What is the meaning of life?"}],
    model=model,
    stream=True,
):
    for choice in response.choices:
        print(choice.delta.content, end="", flush=True)
print("\n")

engine.terminate()

Documentation

For more on MLC LLM, visit the documentation and GitHub repo.

JackBinary
/

OpenCodeReasoning-Nemotron-1.1-32B-q3f16_1-MLC

OpenCodeReasoning-Nemotron-1.1-32B-q3f16_1-MLC

Example Usage

Chat CLI

REST Server

Python API

Documentation

Model tree for JackBinary/OpenCodeReasoning-Nemotron-1.1-32B-q3f16_1-MLC

Collection including JackBinary/OpenCodeReasoning-Nemotron-1.1-32B-q3f16_1-MLC

nvidia/OpenCodeReasoning-Nemotron-1.1