can you provie the data process demo before train llms?
#7
by
scall
- opened
data process demo
def generate_and_tokenize_prompt2(examples, CUTOFF_LEN=SEQ_LEN):
instruction, input_text = examples['instruction'].strip(),examples['input'].strip()
if len(input_text)==0:
user_prompt = f"User:{instruction}\n###{input_text}\n\nAssistant: \n"
else:
user_prompt = f"User:{instruction}\n\nAssistant: \n"
len_user_prompt_tokens = (len(tokenizer(user_prompt,truncation=True,max_length=CUTOFF_LEN + 1,)["input_ids"])- 1) # no eos token
full_tokens = tokenizer(user_prompt + examples["output"],truncation=True,max_length=CUTOFF_LEN + 1,padding="max_length")["input_ids"][:-1]
labels = [-100]*len_user_prompt_tokens + [id if id!=0 else -100 for id in full_tokens[len_user_prompt_tokens:] ]
return {"input_ids": full_tokens,"labels": torch.LongTensor(labels) ,"attention_mask": torch.LongTensor([1] * len(full_tokens)) }
I use my data process to instruct tuning manticore-13b, the train loss initially very large, such as 78.0, but your train loss is very small ( https://wandb.ai/wing-lian/manticore-13b/runs/nq3u3uoh/workspace)
how high is your learning rate set to?
how high is your learning rate set to?
single batch size is 48, runing in single node with 8 gpus, learning rate is 1e-5, warm up steps is 200