Commit History
report min lenght of tokenized data (#1186) [skip ci]
		d85d494
	
		
		unverified
	Phi2 multipack (#1173)
		814aee6
	
		
		unverified
	Add desc to map/filter (#1162)
		6840381
	
		
		unverified
	Falcon embeddings (#1149) [skip docker]
		e799e08
	
		
		unverified
	Vram fix attempt (#1164) [skip ci]
		32580c1
	
		
		unverified
	Deprecate max packed sequence len (#1141)
		2ce5c0d
	
		
		unverified
	Multipack simplify for Mixtral (#1142)
		6910e6a
	
		
		unverified
	Preprocess dataset size fix (#1131)
		7570446
	
		
		unverified
	additional logging to get maximum token length of a sequence in the dataset (#1066) [skip ci]
		2f2582e
	
		
		unverified
	streaming multipack for pretraining dataset (#959)
		553c80f
	
		
		unverified
	RL/DPO (#935)
		f243c21
	
		
		
	Fix Deepspeed loading (#950)
		5ea3aa3
	
		
		unverified
	support for mamba (#915)
		40a6362
	
		
		unverified
	Determine FSDP/deepspeed settings on device select. (#883)
		71b7ea3
	
		
		unverified
	don't train if eval split is too small (#873)
		797f3dd
	
		
		unverified
	various bugfixes (#856)
		1470650
	
		
		unverified
	multipack w batch sampler  (#795)
		641e6f7
	
		
		unverified
	use accelerate logging for zero/main loggin only
		b2430ce
	
		
		
	cleanup verbosity a bit
		4c834bf
	
		
		
	Threaded MultipackDistributedDataloader with prefetched samples (#759)
		05bd6f1
	
		
		unverified
	refactor setup trainer so we can add more hooks (#773)
		6c81c61
	
		
		unverified
	fixes for alpaca w chatml, and don't include attention_mask w mistral for flash attention (#728)
		3553172
	
		
		unverified
	Save Axolotl config as WandB artifact (#716)
		490923f
	
		
		unverified
	
		Jan Philipp Harries
		
	commited on
		
		
refactor to set eval_batch_size earlier if unset, so we can warn if mismatched (#662)
		2642cae
	
		
		unverified
	Make dataset_processes configurable (#651)
		9ec2077
	
		
		unverified
	Fix(cfg): Add validation for save_strategy and eval_strategy (#633)
		383f88d
	
		
		unverified
	attention_mask not needed for training (#642)
		e8cbf50
	
		
		unverified
	chore(callback): Remove old peft saving code (#510)
		d5f8589
	
		
		unverified
	misc fixes to add gptq tests (#621)
		03e5907
	
		
		unverified
	run eval on the first step to get a baseline (#617)
		2844eb2
	
		
		unverified
	minor tweaks to simplify (#597)
		31b9e0c
	
		
		unverified
	gather/broadcast the max value of the packing efficiency automatically (#463)
		b15b19e
	
		
		unverified
	don't add position_ids for evals (#591)
		ab534d7
	
		
		unverified
	optionally configure sample packing for evals (#589)
		21ec195
	
		
		unverified
	fix save_steps so it doesn't get duplicated (#567)
		3fbde76
	
		
		unverified
	improve how we setup eval/save strategies and steps (#547)
		36e53c7
	
		
		unverified
	add optimization for group-by-len (#563)
		e5bb22a
	
		
		unverified
	Add training callback to send predictions to WandB table (#521)
		5b67ea9
	
		
		unverified
	Early stopping metric (#537)
		e30f1e3
	
		
		unverified
	misc fixes/improvements (#513)
		a546ca2
	
		
		unverified
	Add support for GPTQ using native transformers/peft (#468)
		3355706
	
		
		unverified
	log supervised token count (#448)
		7710e81
	
		
		unverified
	Added advanced DDP args (#515)
		396a7a7
	
		
		unverified
	
		Jan Philipp Harries
		
		Jan Philipp Harries
		
	commited on
		
		
 
		 
		 
		 
		