alaeddine-13 commited on
Commit
400c001
·
1 Parent(s): a7b6c41

add usage example

Browse files
Files changed (1) hide show
  1. README.md +34 -0
README.md CHANGED
@@ -35,6 +35,40 @@ The results (on the human eval benchmark) are on par with other open-source mode
35
 
36
  It still underperforms compared to other models like CodeLLama (53%) chat gpt 4 (82) or wizard coder (73.2), but these model are more than 30 times bigger.
37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  ## Finetuning details
39
 
40
  We did full parameter fine-tuning and used a Nvidia a40 for 12 hours using a batch size of 128 and a micro-batch size of 8.
 
35
 
36
  It still underperforms compared to other models like CodeLLama (53%) chat gpt 4 (82) or wizard coder (73.2), but these model are more than 30 times bigger.
37
 
38
+ ## Usage
39
+ You can download and use the model like so:
40
+ ```python
41
+ from transformers import AutoModelForCausalLM, AutoTokenizer
42
+
43
+ model = AutoModelForCausalLM.from_pretrained(
44
+ "jinaai/starcoder-1b-textbook", device_map='auto'
45
+ )
46
+
47
+ tokenizer = AutoTokenizer.from_pretrained("jinaai/starcoder-1b-textbook")
48
+
49
+ prompt = '''
50
+ def unique(l: list):
51
+ """Return sorted unique elements in a list
52
+ >>> unique([5, 3, 5, 2, 3, 3, 9, 0, 123])
53
+ [0, 2, 3, 5, 9, 123]
54
+ """
55
+ '''
56
+
57
+ inputs = tokenizer(prompt.rstrip(), return_tensors="pt").to("cuda")
58
+
59
+ generation_output = model.generate(
60
+ **inputs,
61
+ max_new_tokens=128,
62
+ eos_token_id=tokenizer.eos_token_id,
63
+ return_dict_in_generate=True,
64
+ )
65
+
66
+ s = generation_output.sequences[0]
67
+ output = tokenizer.decode(s, skip_special_tokens=True)
68
+
69
+ print(output)
70
+ ```
71
+
72
  ## Finetuning details
73
 
74
  We did full parameter fine-tuning and used a Nvidia a40 for 12 hours using a batch size of 128 and a micro-batch size of 8.