Text Generation
Transformers

Training Instructions mising 'qwen2' package

#2
by codys12 - opened

I tried training with the provided instructions, but I get this error:

INFO: ########## work in progress ##########
INFO:lightning.pytorch.utilities.rank_zero:########## work in progress ##########
INFO: 
############################################################################
#
# Model qwerky7_qwen2 BF16 on 1x8 GPU, bsz 1x8x12=96, fsdp with grad_cp
#
# Data = data/dclm-10B (binidx), ProjDir = out/L28-D3584-qwerky7_qwen2-4
#
# Epoch = 0 to 71 (will continue afterwards), save every 1 epoch
#
# Each "epoch" = 420 global steps, 40320 samples, 20643840 tokens
#
# Model = 28 n_layer, 3584 n_embd, 512 ctx_len
#
# Adam = lr 7e-06 to 7e-06, warmup 50 steps, beta (0.9, 0.95), eps 1e-08
#
# Found torch 2.7.0+cu126, recommend latest torch
# Found deepspeed None, recommend latest deepspeed
# Found lightning 2.5.1.post0, requires 2+
#
############################################################################

INFO:lightning.pytorch.utilities.rank_zero:
############################################################################
#
# Model qwerky7_qwen2 BF16 on 1x8 GPU, bsz 1x8x12=96, fsdp with grad_cp
#
# Data = data/dclm-10B (binidx), ProjDir = out/L28-D3584-qwerky7_qwen2-4
#
# Epoch = 0 to 71 (will continue afterwards), save every 1 epoch
#
# Each "epoch" = 420 global steps, 40320 samples, 20643840 tokens
#
# Model = 28 n_layer, 3584 n_embd, 512 ctx_len
#
# Adam = lr 7e-06 to 7e-06, warmup 50 steps, beta (0.9, 0.95), eps 1e-08
#
# Found torch 2.7.0+cu126, recommend latest torch
# Found deepspeed None, recommend latest deepspeed
# Found lightning 2.5.1.post0, requires 2+
#
############################################################################

INFO: {}

INFO:lightning.pytorch.utilities.rank_zero:{}

INFO: [rank: 0] Seed set to 1337
INFO:lightning.fabric.utilities.seed:[rank: 0] Seed set to 1337
[2025-06-01 14:58:04,031] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Warning: The cache directory for DeepSpeed Triton autotune, /home/steinmetzc/.triton/autotune, appears to be on an NFS system. While this is generally acceptable, if you experience slowdowns or hanging when DeepSpeed exits, it is recommended to set the TRITON_CACHE_DIR environment variable to a non-NFS path.
RWKV_MODEL_TYPE qwerky7
Traceback (most recent call last):
    from qwen2.configuration_qwen2 import Qwen2Config
ModuleNotFoundError: No module named 'qwen2'
recursal org

This looks like some other kind of training on a very large amount of data? Not quite sure if this is a conversion or something else. If you're using the radlads training code, please file an issue there rather than on the huggingface models and we will do our best to address it. Thanks!

SmerkyG changed discussion status to closed
recursal org

Sorry, I just realized we did not have issues enabled on the code repo! I've enabled that there if you'd like to follow up on this.

Sign up or log in to comment