alokabhishek commited on
Commit
f93c393
·
verified ·
1 Parent(s): c7fb6ca

updated readme with steps on how to run the model

Browse files
Files changed (1) hide show
  1. README.md +39 -5
README.md CHANGED
@@ -38,17 +38,51 @@ This repo contains 4-bit quantized (using ExLlamaV2) model of Meta's meta-llama/
38
 
39
  Use the code below to get started with the model.
40
 
 
41
  ## How to run from Python code
42
 
43
  #### First install the package
 
 
 
 
 
44
 
45
  #### Import
46
 
47
-
48
- #### Use a pipeline as a high-level helper
49
-
50
-
51
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
 
53
  ## Uses
54
 
 
38
 
39
  Use the code below to get started with the model.
40
 
41
+
42
  ## How to run from Python code
43
 
44
  #### First install the package
45
+ ```shell
46
+ # Install ExLLamaV2
47
+ !git clone https://github.com/turboderp/exllamav2
48
+ !pip install -e exllamav2
49
+ ```
50
 
51
  #### Import
52
 
53
+ ```python
54
+ from huggingface_hub import login, HfApi, create_repo
55
+ from torch import bfloat16
56
+ import locale
57
+ import torch
58
+ import os
59
+ ```
60
+
61
+ #### set up variables
62
+
63
+ ```python
64
+ # Define the model ID for the desired model
65
+ model_id = "alokabhishek/Llama-2-7b-chat-hf-5.0-bpw-exl2"
66
+ BPW = 5.0
67
+
68
+ # define variables
69
+ model_name = model_id.split("/")[-1]
70
+ quant_name = model_id.split("/")[-1] + f"-{BPW:.1f}-bpw-exl2"
71
+
72
+ ```
73
+
74
+ #### Download the quantized model
75
+ ```shell
76
+ !git-lfs install
77
+ # download the model to loacl directory
78
+ !git clone https://{username}:{HF_TOKEN}@huggingface.co/{model_id} {quant_name}
79
+ ```
80
+
81
+ #### Run Inference on quantized model using
82
+ ```shell
83
+ # Run model
84
+ !python exllamav2/test_inference.py -m {quant_name}/ -p "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."
85
+ ```
86
 
87
  ## Uses
88