Edit model card

Twitter GitHub LinkedIn Discord

Simply make AI models cheaper, smaller, faster, and greener!

  • Give a thumbs up if you like this model!
  • Contact us and tell us which model to compress next here.
  • Request access to easily compress your own AI models here.
  • Read the documentations to know more here
  • Join Pruna AI community on Discord here to share feedback/suggestions or get help.

Frequently Asked Questions

  • How does the compression work? The model is compressed by using bitsandbytes.
  • How does the model quality change? The quality of the model output will slightly degrade.
  • What is the model format? We the standard safetensors format.
  • How to compress my own models? You can request premium access to more compression methods and tech support for your specific use-cases here.


Here's how you can run the model use the model:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("PrunaAI/stable-code-instruct-3b-bnb-4bit", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("PrunaAI/stable-code-instruct-3b-bnb-4bit", torch_dtype=torch.bfloat16, trust_remote_code=True)
model = model.cuda()

messages = [
        "role": "system",
        "content": "You are a helpful and polite assistant",
        "role": "user",
        "content": "Write a simple website in HTML. When a user clicks the button, it shows a random joke from a list of 4 jokes."

prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)

inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

tokens = model.generate(

output = tokenizer.batch_decode(tokens[:, inputs.input_ids.shape[-1]:], skip_special_tokens=False)[0]

Credits & License

The license of the smashed model follows the license of the original model. Please check the license of the original model stabilityai/stable-code-instruct-3b before using this model which provided the base model. The license of the pruna-engine is here on Pypi.

Want to compress other models?

  • Contact us and tell us which model to compress next here.
  • Request access to easily compress your own AI models here.
Downloads last month
Model size
1.57B params
Tensor type
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Collection including PrunaAI/stable-code-instruct-3b-bnb-4bit