mwitiderrick commited on
Commit
00d154a
·
verified ·
1 Parent(s): a4779a1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -12
README.md CHANGED
@@ -1,11 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ```python
2
  import sparseml.transformers
3
 
4
  original_model_name = "Xenova/llama2.c-stories110M"
 
5
  output_directory = "output/"
6
- final_model_name = "nm-testing/llama2.c-stories110M-pruned2.4"
7
-
8
- dataset = "open_platypus"
9
 
10
  recipe = """
11
  test_stage:
@@ -20,17 +72,13 @@ test_stage:
20
 
21
  # Apply SparseGPT to the model
22
  sparseml.transformers.oneshot(
23
- model_name_or_path=original_model_name,
24
- dataset_name=dataset,
25
  recipe=recipe,
26
  output_dir=output_directory,
27
  )
 
28
 
29
- # Upload the output model to Hugging Face Hub
30
- from huggingface_hub import HfApi
31
 
32
- HfApi().upload_folder(
33
- folder_path=output_directory,
34
- repo_id=final_model_name,
35
- )
36
- ```
 
1
+ ---
2
+ base_model: Xenova/llama2.c-stories110M
3
+ inference: true
4
+ model_type: llama
5
+ quantized_by: mgoin
6
+ tags:
7
+ - nm-vllm
8
+ - sparse
9
+ ---
10
+
11
+ ## llama2.c-stories110M-pruned50
12
+ This repo contains model files for [llama2.c 110M tinystories](https://huggingface.co/Xenova/llama2.c-stories110M) optimized for [NM-vLLM](https://github.com/neuralmagic/nm-vllm), a high-throughput serving engine for compressed LLMs.
13
+
14
+ This model was pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
15
+
16
+ ## Inference
17
+ Install [NM-vLLM](https://github.com/neuralmagic/nm-vllm) for fast inference and low memory-usage:
18
+ ```bash
19
+ pip install nm-vllm[sparse]
20
+ ```
21
+ Run in a Python pipeline for local inference:
22
+ ```python
23
+ from vllm import LLM, SamplingParams
24
+
25
+ model = LLM("nm-testing/llama2.c-stories110M-pruned2.4", sparsity="sparse_w16a16")
26
+ prompt = "My name is "
27
+ formatted_prompt = f"<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n"
28
+
29
+ sampling_params = SamplingParams(max_tokens=100,temperature=0)
30
+ outputs = model.generate(prompt, sampling_params=sampling_params)
31
+ print(outputs[0].outputs[0].text)
32
+
33
+ """"
34
+ 3 years old. My name is Sam. I love to play with my toys. I love to play with my toys.
35
+ One day, my mom takes me to the park. She brings a big bag. She takes out a big bag. It is full of things.
36
+ At the park, Sam sees a big box. He sees it was made from paper. He sees it is made from paper. He sees it is made from paper.
37
+ Sam's mom takes outs
38
+ """
39
+ ```
40
+
41
+ ## Prompt template
42
+
43
+ N/A
44
+
45
+ ## Sparsification
46
+ For details on how this model was sparsified, see the `recipe.yaml` in this repo and follow the instructions below.
47
+
48
+ Install [SparseML](https://github.com/neuralmagic/sparseml):
49
+ ```bash
50
+ git clone https://github.com/neuralmagic/sparseml
51
+ pip install -e "sparseml[transformers]"
52
+ ```
53
+
54
+ Replace the recipe as you like and run this one-shot compression script to apply SparseGPT:
55
  ```python
56
  import sparseml.transformers
57
 
58
  original_model_name = "Xenova/llama2.c-stories110M"
59
+ calibration_dataset = "open_platypus"
60
  output_directory = "output/"
 
 
 
61
 
62
  recipe = """
63
  test_stage:
 
72
 
73
  # Apply SparseGPT to the model
74
  sparseml.transformers.oneshot(
75
+ model=original_model_name,
76
+ dataset=calibration_dataset,
77
  recipe=recipe,
78
  output_dir=output_directory,
79
  )
80
+ ```
81
 
82
+ ## Slack
 
83
 
84
+ For further support, and discussions on these models and AI in general, join [Neural Magic's Slack Community](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)