--- base_model: unsloth/phi-3.5-mini-instruct-bnb-4bit tags: - text-generation-inference - transformers - unsloth - llama - trl license: apache-2.0 language: - en --- # Model Summary Reason Phi model for top performing model with it's size of 3.8B. Phi-3 - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family and supports 128K token context length. # Run locally ### 4bit After obtaining the Phi-3.5-mini-instruct model checkpoint, users can use this sample code for inference. ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, BitsAndBytesConfig torch.random.manual_seed(0) model_path = "EpistemeAI/DeepPhi-3.5-mini-instruct" # Configure 4-bit quantization using bitsandbytes quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", # You can also try "fp4" if desired. bnb_4bit_compute_dtype=torch.float16 # Or torch.bfloat16 depending on your hardware. ) model = AutoModelForCausalLM.from_pretrained( model_path, device_map="auto", torch_dtype=torch.float16, trust_remote_code=True, quantization_config=quantization_config, ) tokenizer = AutoTokenizer.from_pretrained(model_path) messages = [ {"role": "system", "content": """ You are a helpful AI assistant. Respond in the following format: ... ... """}, {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"}, {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."}, {"role": "user", "content": "What about solving a 2x + 3 = 7 equation?"}, ] def format_messages(messages): prompt = "" for msg in messages: role = msg["role"].capitalize() prompt += f"{role}: {msg['content']}\n" return prompt.strip() prompt = format_messages(messages) pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, ) generation_args = { "max_new_tokens": 500, "return_full_text": False, "temperature": 0.0, "do_sample": False, } output = pipe(prompt, **generation_args) print(output[0]['generated_text']) ``` # Uploaded model - **Developed by:** EpistemeAI - **License:** apache-2.0 - **Finetuned from model :** unsloth/phi-3.5-mini-instruct-bnb-4bit This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth)