Commits · Dovakiins/qwerrwe

Phi2 multipack (#1173)

814aee6
unverified

winglian commited on Jan 23, 2024

DPO cleanup (#1126)

7523d1f
unverified

winglian

plaguss commited on Jan 23, 2024

Falcon embeddings (#1149) [skip docker]

e799e08
unverified

winglian commited on Jan 23, 2024

jupyter lab fixes (#1139) [skip ci]

eaaeefc
unverified

winglian commited on Jan 22, 2024

Qwen2 (#1166)

f5a828a
unverified

winglian commited on Jan 22, 2024

make sure the model config loader respects the model_revision too (#1160) [skip-ci]

fccb542
unverified

winglian commited on Jan 22, 2024

Deprecate max packed sequence len (#1141)

2ce5c0d
unverified

winglian commited on Jan 20, 2024

Multipack simplify for Mixtral (#1142)

6910e6a
unverified

winglian commited on Jan 18, 2024

Add shifted sparse attention (#973) [skip-ci]

1d70f24
unverified

jrc joecummings

winglian commited on Jan 18, 2024

Add `layers_to_transform` for `lora_config` (#1118)

8487b97
unverified

xzuyn commited on Jan 16, 2024

keep gate in fp32 for 16 bit loras (#1105)

da97285
unverified

winglian commited on Jan 12, 2024

add gptneox embeddings, fix phi2 inputs, also fix the casting (#1083)

78c5b19
unverified

winglian commited on Jan 11, 2024

be more robust about checking embedding modules for lora finetunes (#1074) [skip ci]

0f10080
unverified

winglian commited on Jan 10, 2024

fix: torch_dtype mistral default to fp32 (#1050)

c3e8165
unverified

Nanobit commited on Jan 9, 2024

Phi2 rewrite (#1058)

732851f
unverified

winglian commited on Jan 8, 2024

feature: better device mapping for large models (#918)

bdfefaf
unverified

kallewoof Karl-Johan Alm

winglian commited on Jan 5, 2024

RL/DPO (#935)

f243c21

winglian commited on Jan 4, 2024

bump transformers and update attention class map name (#1023)

bcc78d8
unverified

winglian commited on Jan 3, 2024

Adds chat templates (#1022)

f8ae59b
unverified

mhenrichsen commited on Dec 29, 2023

feat: expose bnb kwargs (#1018)

41353d2
unverified

Nanobit

hamel commited on Dec 29, 2023

remove landmark attn and xpos rope implementations (#1010)

70b46ca
unverified

winglian commited on Dec 28, 2023

Feat: Warns to add to modules_to_save when adding tokens or switching special_tokens (#787)

1ffa386
unverified

Nanobit commited on Dec 22, 2023

Fix Deepspeed loading (#950)

5ea3aa3
unverified

winglian commited on Dec 13, 2023

Flash attn hotfix (#951)

f1f60cb
unverified

winglian commited on Dec 13, 2023

Mixtral official (#942)

7fabc4d
unverified

winglian commited on Dec 12, 2023

Mixtral multipack (#928)

68b227a
unverified

winglian commited on Dec 10, 2023

support for mamba (#915)

40a6362
unverified

winglian commited on Dec 9, 2023

fix(tokenizer): handle fast tokenizer properly for bos/eos (#914)

fde091c
unverified

Nanobit commited on Dec 8, 2023

feat: add check for quantized model (#913)

a581e9f
unverified

Nanobit

winglian commited on Dec 4, 2023

Support device_map=sequential & max_memory config parameters (#903)

992e742
unverified

Bryan Thornbury

winglian commited on Dec 4, 2023

fix for qwen w lora (#906)

3e3229e
unverified

winglian commited on Nov 30, 2023

Feat: Add Qwen (#894)

1115c50
unverified

Nanobit commited on Nov 25, 2023

Phi update 202311 (#876)

9bf854e
unverified

winglian commited on Nov 17, 2023

allow overriding of model_config parameters from the YML (#853)

1bc1186
unverified

winglian commited on Nov 16, 2023

fix model parallel (#816)

964d858
unverified

winglian commited on Nov 3, 2023

fix(tokenizer): update log order after update (#806)

10388a8
unverified

Nanobit commited on Oct 31, 2023

fix(config): Set eos/bos to tokenizer if different (#801)

637ed09
unverified

Nanobit commited on Oct 29, 2023

refactor neft patch to be more re-usable similar to trl's impl (#796)

827ec3d
unverified

winglian commited on Oct 29, 2023

chore: refactor truthy check and fix mypy (#780)

11d1d60
unverified

Nanobit commited on Oct 24, 2023

Implement fused modules (#747)

15d3a65
unverified

casperhansen

winglian commited on Oct 21, 2023

Fix(model): Linear detected and added to target module with rope linear (#738)

440c3ab
unverified

Nanobit commited on Oct 19, 2023

add noisy embedding (#721)

3bd9528
unverified

Maxime Maxime commited on Oct 13, 2023

Fix: Higher vram usage for mistral and sample_packing (#691)

669f1d0
unverified

Nanobit commited on Oct 6, 2023

flash_attention + sample packing for stablelm 3b (#671)

2d60ba3
unverified

winglian commited on Oct 5, 2023

Fix: ValueError when FA + Mistral when padding_side=right (#681)

eb480df
unverified

Nanobit commited on Oct 5, 2023

Fix(tokenizer): Set rstrip,lstrip,norm to False (#678)

e0b7eea
unverified

Nanobit commited on Oct 5, 2023

chore: Clean up repetitive model kwargs (#670)

e62d590
unverified

Nanobit commited on Oct 4, 2023

Feat: Allow usage of native Mistral FA when no sample_packing (#669)

697c50d
unverified

Nanobit commited on Oct 4, 2023

remove patch fix for phi (#664)

f34648c
unverified

winglian commited on Oct 3, 2023

Mistral flash attn packing (#646)

b6ab8aa
unverified

winglian commited on Sep 27, 2023

Commit History

Phi2 multipack (#1173) 814aee6 unverified

DPO cleanup (#1126) 7523d1f unverified

Falcon embeddings (#1149) [skip docker] e799e08 unverified

jupyter lab fixes (#1139) [skip ci] eaaeefc unverified

Qwen2 (#1166) f5a828a unverified

make sure the model config loader respects the model_revision too (#1160) [skip-ci] fccb542 unverified

Deprecate max packed sequence len (#1141) 2ce5c0d unverified

Multipack simplify for Mixtral (#1142) 6910e6a unverified

Add shifted sparse attention (#973) [skip-ci] 1d70f24 unverified

Add `layers_to_transform` for `lora_config` (#1118) 8487b97 unverified

keep gate in fp32 for 16 bit loras (#1105) da97285 unverified

add gptneox embeddings, fix phi2 inputs, also fix the casting (#1083) 78c5b19 unverified

be more robust about checking embedding modules for lora finetunes (#1074) [skip ci] 0f10080 unverified

fix: torch_dtype mistral default to fp32 (#1050) c3e8165 unverified

Phi2 rewrite (#1058) 732851f unverified

feature: better device mapping for large models (#918) bdfefaf unverified

RL/DPO (#935) f243c21

bump transformers and update attention class map name (#1023) bcc78d8 unverified

Adds chat templates (#1022) f8ae59b unverified

feat: expose bnb kwargs (#1018) 41353d2 unverified

remove landmark attn and xpos rope implementations (#1010) 70b46ca unverified

Feat: Warns to add to modules_to_save when adding tokens or switching special_tokens (#787) 1ffa386 unverified

Fix Deepspeed loading (#950) 5ea3aa3 unverified

Flash attn hotfix (#951) f1f60cb unverified

Mixtral official (#942) 7fabc4d unverified

Mixtral multipack (#928) 68b227a unverified

support for mamba (#915) 40a6362 unverified

fix(tokenizer): handle fast tokenizer properly for bos/eos (#914) fde091c unverified

feat: add check for quantized model (#913) a581e9f unverified

Support device_map=sequential & max_memory config parameters (#903) 992e742 unverified

fix for qwen w lora (#906) 3e3229e unverified

Feat: Add Qwen (#894) 1115c50 unverified

Phi update 202311 (#876) 9bf854e unverified

allow overriding of model_config parameters from the YML (#853) 1bc1186 unverified

fix model parallel (#816) 964d858 unverified

fix(tokenizer): update log order after update (#806) 10388a8 unverified

fix(config): Set eos/bos to tokenizer if different (#801) 637ed09 unverified

refactor neft patch to be more re-usable similar to trl's impl (#796) 827ec3d unverified

chore: refactor truthy check and fix mypy (#780) 11d1d60 unverified

Implement fused modules (#747) 15d3a65 unverified

Fix(model): Linear detected and added to target module with rope linear (#738) 440c3ab unverified

add noisy embedding (#721) 3bd9528 unverified

Fix: Higher vram usage for mistral and sample_packing (#691) 669f1d0 unverified

flash_attention + sample packing for stablelm 3b (#671) 2d60ba3 unverified

Fix: ValueError when FA + Mistral when padding_side=right (#681) eb480df unverified

Fix(tokenizer): Set rstrip,lstrip,norm to False (#678) e0b7eea unverified

chore: Clean up repetitive model kwargs (#670) e62d590 unverified

Feat: Allow usage of native Mistral FA when no sample_packing (#669) 697c50d unverified

remove patch fix for phi (#664) f34648c unverified

Mistral flash attn packing (#646) b6ab8aa unverified

Phi2 multipack (#1173)

814aee6
unverified

DPO cleanup (#1126)

7523d1f
unverified

Falcon embeddings (#1149) [skip docker]

e799e08
unverified

jupyter lab fixes (#1139) [skip ci]

eaaeefc
unverified

Qwen2 (#1166)

f5a828a
unverified

make sure the model config loader respects the model_revision too (#1160) [skip-ci]

fccb542
unverified

Deprecate max packed sequence len (#1141)

2ce5c0d
unverified

Multipack simplify for Mixtral (#1142)

6910e6a
unverified

Add shifted sparse attention (#973) [skip-ci]

1d70f24
unverified

Add `layers_to_transform` for `lora_config` (#1118)

8487b97
unverified

keep gate in fp32 for 16 bit loras (#1105)

da97285
unverified

add gptneox embeddings, fix phi2 inputs, also fix the casting (#1083)

78c5b19
unverified

be more robust about checking embedding modules for lora finetunes (#1074) [skip ci]

0f10080
unverified

fix: torch_dtype mistral default to fp32 (#1050)

c3e8165
unverified

Phi2 rewrite (#1058)

732851f
unverified

feature: better device mapping for large models (#918)

bdfefaf
unverified

RL/DPO (#935)

f243c21

bump transformers and update attention class map name (#1023)

bcc78d8
unverified

Adds chat templates (#1022)

f8ae59b
unverified

feat: expose bnb kwargs (#1018)

41353d2
unverified

remove landmark attn and xpos rope implementations (#1010)

70b46ca
unverified

Feat: Warns to add to modules_to_save when adding tokens or switching special_tokens (#787)

1ffa386
unverified

Fix Deepspeed loading (#950)

5ea3aa3
unverified

Flash attn hotfix (#951)

f1f60cb
unverified

Mixtral official (#942)

7fabc4d
unverified

Mixtral multipack (#928)

68b227a
unverified

support for mamba (#915)

40a6362
unverified

fix(tokenizer): handle fast tokenizer properly for bos/eos (#914)

fde091c
unverified

feat: add check for quantized model (#913)

a581e9f
unverified

Support device_map=sequential & max_memory config parameters (#903)

992e742
unverified

fix for qwen w lora (#906)

3e3229e
unverified

Feat: Add Qwen (#894)

1115c50
unverified

Phi update 202311 (#876)

9bf854e
unverified

allow overriding of model_config parameters from the YML (#853)

1bc1186
unverified

fix model parallel (#816)

964d858
unverified

fix(tokenizer): update log order after update (#806)

10388a8
unverified

fix(config): Set eos/bos to tokenizer if different (#801)

637ed09
unverified

refactor neft patch to be more re-usable similar to trl's impl (#796)

827ec3d
unverified

chore: refactor truthy check and fix mypy (#780)

11d1d60
unverified

Implement fused modules (#747)

15d3a65
unverified

Fix(model): Linear detected and added to target module with rope linear (#738)

440c3ab
unverified

add noisy embedding (#721)

3bd9528
unverified

Fix: Higher vram usage for mistral and sample_packing (#691)

669f1d0
unverified

flash_attention + sample packing for stablelm 3b (#671)

2d60ba3
unverified

Fix: ValueError when FA + Mistral when padding_side=right (#681)

eb480df
unverified

Fix(tokenizer): Set rstrip,lstrip,norm to False (#678)

e0b7eea
unverified

chore: Clean up repetitive model kwargs (#670)

e62d590
unverified

Feat: Allow usage of native Mistral FA when no sample_packing (#669)

697c50d
unverified

remove patch fix for phi (#664)

f34648c
unverified

Mistral flash attn packing (#646)

b6ab8aa
unverified