# chatglm-maths chatglm-6b微调/LORA/PPO/推理, 样本为自动生成的整数/小数加减乘除运算, 可gpu/cpu # Github [https://github.com/yongzhuo/chatglm-maths](https://github.com/yongzhuo/chatglm-maths) ## 踩坑 ```python 1. eps=1e-5(不要改小), 半精度float16, 以及LN采用的是Post-LN(泛化性更好) + DeepNorm, 【害, Attention前也有LN】目的是大模型为了防止梯度溢出等; 2. 模型输入输出, 默认的tokenization_chatglm.py/modeling_chatglm.py不能用, 因为那是完全为生成generate设置的, 需要自己写好所有缩入参数, 或者机子改成适配的; 2.1 ChatGLMModel中, get_masks()正常, get_position_ids()函数中‘context_length = seq.index(150004) + 1’ 改为 ‘context_length = len(seq)’; 2.2 训练输入input_ids格式暂定为(训练后post-padding, 推理前pre-padding[tokenization_chatglm.py默认pre-padding]) x: prompt_1 + "_" + text_1 + "\n" + prompt_2 + [gMASK] + [BOS] + "_" + text_2 + [PAD]*N 2.3 训练输入label_ids格式暂定为(CrossEntropyLoss默认忽略-100不参与计算loss) y = [-100]*len(text_1) + [BOS] + text_2 + [EOS] + [-100]*N 2.4 注意position/mask(自带的只是推理用的batch_size=1, 所以训练输入还得自己写), 可参考GLM-130的README.md, huozhe 查看GLM-1源码https://github.com/THUDM/GLM/blob/main/tasks/seq2seq/dataset.py 3. 注意chatglm-6b权重是float16的, 不过计算loss时候会转成float32计算, 最后loss再转回float16更新梯度; 4. ChatGLMTokenizer有时候会报奇奇怪怪的错误, 建议生成时候设置max_new_tokens, 最大{"max_new_tokens": 2048}; decode有时候会出现不存在id; 5. 低秩自适应LORA, RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! 尝试 transformers升级到最新, get_peft_model后再.cuda(), device_map={'':torch.cuda.current_device()}, ``` ## 微调数据 1. 原始数据来自[https://github.com/LYH-YF/MWPToolkit](https://github.com/LYH-YF/MWPToolkit) 处理后的微调数据(算式/解方程)-MWP: [https://huggingface.co/datasets/Macropodus/MWP-Instruct](https://huggingface.co/datasets/Macropodus/MWP-Instruct) 3. 大数加减乘除来自: [https://github.com/liutiedong/goat.git ](https://github.com/liutiedong/goat.git ) ## LoRA权重 ```shell Baichuan-7B-GPT4ForALL: https://huggingface.co/Macropodus/MWP-Instruct Bloomz-7B-GPT4ForALL: https://huggingface.co/Macropodus/MWP-Instruct ChatGLM-6B-GPT4ForALL: https://huggingface.co/Macropodus/MWP-Instruct LlaMA-7B-GPT4ForALL: https://huggingface.co/Macropodus/MWP-Instruct ChatGLM-6B-MWP: https://huggingface.co/Macropodus/MWP-Instruct ``` ## 数据集-中文 - [https://github.com/tatsu-lab/stanford_alpaca](https://github.com/tatsu-lab/stanford_alpaca) - [https://github.com/LianjiaTech/BELLE](https://github.com/LianjiaTech/BELLE) - [https://github.com/carbonz0/alpaca-chinese-dataset](https://github.com/carbonz0/alpaca-chinese-dataset) ## 环境配置 ```shell transformers>=4.26.1 cpm_kernels==1.0.11 icetk==0.0.4 torch>=1.10.1 rouge==1.0.1 nltk==3.6.6 peft>=0.2.0 numpy tqdm lion_pytorch macropodus trl>=0.4.1 ``` ## 微调-计算题 ```shell lora 微调: python c00_toy_lora_train_6b.py 推理: python p00_toy_lora_predict_6b.py ppo 训练: python t10_toy_trl_train_ppo.py 测试: python t10_toy_trl_predict_ppo.py 6b 微调: python c00_toy_cpu_train_6b.py 推理: python p00_toy_cpu_predit_6b.py small-layer 微调: python c01_toy_cpu_train_small.py 推理: python p01_toy_cpu_predict_small.py ``` ## 参考/感谢 - [https://github.com/THUDM/ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B) - [https://github.com/THUDM/GLM](https://github.com/THUDM/GLM) - [https://github.com/tatsu-lab/stanford_alpaca](https://github.com/tatsu-lab/stanford_alpaca) - [https://github.com/LianjiaTech/BELLE](https://github.com/LianjiaTech/BELLE) - [https://github.com/huggingface/peft](https://github.com/huggingface/peft) - [https://github.com/mymusise/ChatGLM-Tuning](https://github.com/mymusise/ChatGLM-Tuning) - [https://github.com/bojone/bert4keras](https://github.com/bojone/bert4keras) - [trl](https://github.com/lvwerra/trl) - [math23k](https://aclanthology.org/D17-1088) ## 推理日志toy ```cpu generator_calculate_line: ('13+75=', '13+75=88') tokenizer.vocab_size: 150344 eval: 0%| | 0/1 [00:00