updated readme with steps on how to run the model

Files changed (1) hide show

README.md CHANGED Viewed

@@ -38,17 +38,51 @@ This repo contains 4-bit quantized (using ExLlamaV2) model of Meta's meta-llama/
 Use the code below to get started with the model.
 ## How to run from Python code
 #### First install the package
 #### Import
-#### Use a pipeline as a high-level helper
 ## Uses

 Use the code below to get started with the model.
 ## How to run from Python code
 #### First install the package
+```shell
+# Install ExLLamaV2
+!git clone https://github.com/turboderp/exllamav2
+!pip install -e exllamav2
+```
 #### Import
+```python
+from huggingface_hub import login, HfApi, create_repo
+from torch import bfloat16
+import locale
+import torch
+import os
+```
+#### set up variables
+```python
+# Define the model ID for the desired model
+model_id = "alokabhishek/Llama-2-7b-chat-hf-5.0-bpw-exl2"
+BPW = 5.0
+# define variables
+model_name =  model_id.split("/")[-1]
+quant_name =  model_id.split("/")[-1] + f"-{BPW:.1f}-bpw-exl2"
+```
+#### Download the quantized model
+```shell
+!git-lfs install
+# download the model to loacl directory
+!git clone https://{username}:{HF_TOKEN}@huggingface.co/{model_id} {quant_name}
+```
+#### Run Inference on quantized model using
+```shell
+# Run model
+!python exllamav2/test_inference.py -m {quant_name}/ -p "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."
+```
 ## Uses