Upcycling Experiments - a gabrielmbmb Collection

gabrielmbmb 's Collections

Upcycling Papers

Synthetic Data Papers

Upcycling Experiments

LLM Leaderboards

Upcycling Experiments

updated Sep 20, 2024

Models I pre-trained initialising SMoE models using dense model weights and the upcycling process used for Qwen1.5-MoE2.7BA (or something similar)

gabrielmbmb/Upcycled-Qwen1.5-MoE2.7B

Text Generation • 14B • Updated Mar 30, 2024 • 10 • 2

Note This model hasn't been trained, just initialised using upcycling process and the weights from Qwen/Qwen1.5-18B.
gabrielmbmb/Upcycled-Qwen1.5-MoE2.7B-LoRA

Updated Mar 30, 2024 • 1
gabrielmbmb/Upcycled-Qwen1.5-MoE2.7B-LoRA-merged

Text Generation • 14B • Updated Mar 30, 2024 • 6

Note Using LoRA and targeting up_proj, gate_proj, down_proj, gate, and shared_expert_gate. LoRA rank was 8. About 126M trainable parameters. Dataset used was wiki_demo from LLaMa Factory.
gabrielmbmb/Upcycled-Qwen1.5-MoE2.7B-LoRA-merged-32

Text Generation • 14B • Updated Mar 31, 2024 • 6

Note Using LoRA targeting all the layers and rank 32. About 500m trainable parameters.
gabrielmbmb/Upcycled-Qwen1.5-MoE2.7B-LoRA-merged-32-2000-steps

Text Generation • 14B • Updated Mar 31, 2024 • 9
gabrielmbmb/Upcycled-Qwen1.5-MoE2.7B-LoRA-merged-32-2000-steps-adapter

Updated Mar 31, 2024 • 2