[2025-02-01 18:47:39,024][oumi][rank1][pid:11751][MainThread][INFO]][train.py:144] Resolved 'training.dataloader_num_workers=auto' to 'training.dataloader_num_workers=8' [2025-02-01 18:47:39,328][oumi][rank1][pid:11751][MainThread][INFO]][models.py:180] Building model for distributed training (world_size: 4)... [2025-02-01 18:47:39,328][oumi][rank1][pid:11751][MainThread][INFO]][models.py:185] Building model using device_map: cuda:1 (DeviceRankInfo(world_size=4, rank=1, local_world_size=4, local_rank=1))... [2025-02-01 18:47:39,328][oumi][rank1][pid:11751][MainThread][INFO]][models.py:255] Using model class: to instantiate model. [2025-02-01 18:47:41,530][oumi][rank1][pid:11751][MainThread][INFO]][base_map_dataset.py:68] Creating map dataset (type: TextSftJsonLinesDataset) dataset_name: 'text_sft_jsonl', dataset_path: 'None'... [2025-02-01 18:47:41,663][oumi][rank1][pid:11751][MainThread][INFO]][base_map_dataset.py:297] TextSftJsonLinesDataset: features=dict_keys(['input_ids', 'attention_mask']) [2025-02-01 18:47:47,716][oumi][rank1][pid:11751][MainThread][INFO]][base_map_dataset.py:361] Finished transforming dataset (TextSftJsonLinesDataset)! Speed: 1652.20 examples/sec. Examples: 10000. Duration: 6.1 sec. Transform workers: 1. [2025-02-01 18:47:47,984][oumi][rank1][pid:11751][MainThread][INFO]][torch_profiler_utils.py:150] PROF: Torch Profiler disabled! [2025-02-01 18:47:48,077][oumi][rank1][pid:11751][MainThread][INFO]][device_utils.py:283] GPU Metrics Before Training: GPU runtime info: NVidiaGpuRuntimeInfo(device_index=0, device_count=4, used_memory_mb=7019.0, temperature=33, fan_speed=None, fan_speeds=None, power_usage_watts=70.637, power_limit_watts=400.0, gpu_utilization=0, memory_utilization=0, performance_state=0, clock_speed_graphics=1155, clock_speed_sm=1155, clock_speed_memory=1593). [2025-02-01 18:47:48,078][oumi][rank1][pid:11751][MainThread][INFO]][train.py:312] Training init time: 10.795s [2025-02-01 18:47:48,078][oumi][rank1][pid:11751][MainThread][INFO]][train.py:313] Starting training... (TrainerType.TRL_SFT, transformers: 4.45.2) [2025-02-01 18:52:35,469][oumi][rank1][pid:11751][MainThread][INFO]][train.py:320] Training is Complete. [2025-02-01 18:52:35,498][oumi][rank1][pid:11751][MainThread][INFO]][device_utils.py:283] GPU Metrics After Training: GPU runtime info: NVidiaGpuRuntimeInfo(device_index=0, device_count=4, used_memory_mb=21283.0, temperature=43, fan_speed=None, fan_speeds=None, power_usage_watts=181.852, power_limit_watts=400.0, gpu_utilization=54, memory_utilization=14, performance_state=0, clock_speed_graphics=1410, clock_speed_sm=1410, clock_speed_memory=1593). [2025-02-01 18:52:35,498][oumi][rank1][pid:11751][MainThread][INFO]][torch_utils.py:117] Peak GPU memory usage: 17.24 GB [2025-02-01 18:52:35,498][oumi][rank1][pid:11751][MainThread][INFO]][train.py:327] Saving final state... [2025-02-01 18:52:35,504][oumi][rank1][pid:11751][MainThread][INFO]][train.py:332] Saving final model... [2025-02-01 18:52:43,653][oumi][rank1][pid:11751][MainThread][INFO]][train.py:339] ยป We're always looking for feedback. What's one thing we can improve? https://oumi.ai/feedback