Spaces:
				
			
			
	
			
			
					
		Running
		
	
	
	
			
			
	
	
	
	
		
		
					
		Running
		
	Commit History
feat(train): log norm and histograms (#143)
		b7b619a
	
		
		unverified
	feat(data): super conditioning (#141)
		7939874
	
		
		unverified
	feat: support pod (#139)
		803ccbf
	
		
		unverified
	fix: no gradient checkpointing for new model
		2e02683
	
		
		
	feat: no gradient checkpointing for params init
		b798ed3
	
		
		
	fix(train): consider schedule offset
		bc4734f
	
		
		
	feat(train): local jax cache
		9f5e879
	
		
		
	feat: add bucket reference to artifact
		d368fb6
	
		
		
	style: lint
		d5d442a
	
		
		
	feat: handle gradient checkpointing
		5173ec7
	
		
		
	feat: load from bucket
		1c4e839
	
		
		
	feat(train): save to bucket
		50498e6
	
		
		
	feat: reduce artifact space + offset step
		34cf91c
	
		
		
	feat: restore weights on CPU
		5f954fc
	
		
		
	feat(train): simplify tokenizer loading
		4cb21dd
	
		
		
	feat(train): use compilation cache
		da9367c
	
		
		
	feat: log num_parameters early
		7cfe576
	
		
		
	feat(train) - handle multiple nodes (#130)
		0952927
	
		
		unverified
	feat: handle model parallel
		1bb3269
	
		
		
	feat(train): more custom x-axis
		5f28cd2
	
		
		
	fix(train): opt_state_shape for distributed_shampoo
		225b6ff
	
		
		
	feat(train): split artifact into model/state
		fa5b058
	
		
		
	feat(train): another 25% faster
		14abe8c
	
		
		
	feat(train): overhead from 70% to 1% 🥳
		2b7f5f1
	
		
		
	feat(pjit): follow t5x style
		7b5868f
	
		
		
	fix(train): grads spec
		00710bc
	
		
		
	feat(train): improve pjit speed
		f254058
	
		
		
	fix(train): consider correct batch size
		b7c7458
	
		
		
	feat(train): custom start_preconditioning_step
		8149924
	
		
		
	feat(train): handle distributed_shampoo in pjit
		032f623
	
		
		
	feat(train): distributed_shampoo with pjit
		cc34d07
	
		
		
	fix style
		f044cb8
	
		
		
	feat(train): restore opt_state efficiently
		1bfc1b5
	
		
		
	feat(model): clean way to load on cpu
		12f323d
	
		
		
	feat(train): load model on CPU
		3d43591
	
		
		
	feat(train): different rng per node
		2d212d8
	
		
		
	feat(train): no batch dimension with pjit
		df1fe19
	
		
		
	feat(train): progress on pjit
		49597a2
	
		
		
	feat(train): start pjit support
		0081723
	
		
		
	Load from wandb artifact (#121)
		f69b21b
	
		
		unverified
	Use DalleBartTokenizer. State restoration reverted to previous method:
		ae983d7
	
		
		
	
		Pedro Cuenca
		
	commited on
		
		
fix(train): variable not defined
		4c87adf
	
		
		
	feat(train): cleanup args
		a2bf605
	
		
		
	refactor(train): cleanup
		274ba73
	
		
		
	feat: custom gradient accumulation
		2d07559
	
		
		
	fix: style
		df01fa8
	
		
		
	feat(train): use MultiSteps for gradient accumulation
		4fa53a5
	
		
		
	Accept changes suggested by linter.
		9f522b8
	
		
		
	
		Pedro Cuenca
		
	commited on
		
		
Update help string for `model_name_or_path`.
		290e443
	
		
		
	
		Pedro Cuenca
		
	commited on
		
		

