llamafy/README · Model Request: ByteDance-Seed/Seed-OSS-36B-Instruct

Downtown-Case

llamafy org 4 days ago

As the title says. I believe it's basically Qwen 2.5.

mrfakename

llamafy org 4 days ago

36B params... I can't even run it right now 😭

Will look into it

Downtown-Case

llamafy org 4 days ago

•

edited 4 days ago

Thanks! I know its at least similar enough that inference frameworks can import it as Qwen 2.5/Llama, but I figured it would be convenient for training.

I can try the conversion script if you want, or even make a tight quant to run.

mrfakename

llamafy org 4 days ago

Sure, if you implement the conversion script I can add you to the llamafy org if you want to upload it here :)
For Qwen2/Qwen2.5 weights I use this script to convert: https://gist.github.com/fakerybakery/0c296b0f1b595bef2b7417b1f67916f9

Downtown-Case

llamafy org 4 days ago

•

edited 4 days ago

Script seems to have worked, thanks! I modified the save function to work with newer versions of transformers:

https://gist.github.com/Downtown-Case/ef9aa9677d68f8eec29a54a35d6445b7

Saved tensors look similar, but I will test if it actually inferences coherently in a bit. If it does, I'll upload here if you add me.

Downtown-Case

llamafy org 4 days ago

•

edited 4 days ago

Ha! It's literally token identical, or at least it is loaded via bitsandbytes:

~/AI/scripts
venv ❯ python test_byte.py
ByteDance-Seed_Seed-OSS-36B-Instruct
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:28<00:00,  1.92s/it]
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
<seed:bos>system
You are an intelligent assistant that can answer questions in one step without the need for reasoning and thinking, that is, your thinking budget is 0. Next, please skip the thinking process and directly start answering the user's questions.
<seed:eos><seed:bos>user
How to make pasta?<seed:eos><seed:bos>assistant
<seed:think><seed:cot_budget_reflect>The current thinking budget is 0, so I will directly start answering the question.</seed:cot_budget_reflect>
</seed:think>To make pasta, follow these key steps:  


### **1. Prepare the Dough**  
- **Ingredients**: 500g (3½ cups) all-purpose or bread

venv ❯ python test_byte_llamafied.py
ByteDance-Seed_Seed-OSS-36B-Instruct-llamafie
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:19<00:00,  2.07it/s]
The following generation flags are not valid and may be ignored: ['temperature', 'top_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
<seed:bos>system
You are an intelligent assistant that can answer questions in one step without the need for reasoning and thinking, that is, your thinking budget is 0. Next, please skip the thinking process and directly start answering the user's questions.
<seed:eos><seed:bos>user
How to make pasta?<seed:eos><seed:bos>assistant
<seed:think><seed:cot_budget_reflect>The current thinking budget is 0, so I will directly start answering the question.</seed:cot_budget_reflect>
</seed:think>To make pasta, follow these key steps:  


### **1. Prepare the Dough**  
- **Ingredients**: 500g (3½ cups) all-purpose or bread

I don't understand, why specify a new architecture that's not supported in anything when it's the same as llama?

mrfakename

llamafy org 4 days ago

I sent you an invite, let me know if it works!

I don't understand, why specify a new architecture that's not supported in anything when it's the same as llama?

I think companies often like to do this, e.g. Yi was essentially the same as Llama. Probably looks better for investors to have the architecture set as "seed_oss" rather than "llama" and might cause confusion if people think it is Llama-based. Thought it does come at the tradeoff of ease of use. That is why this org exists!

Downtown-Case

llamafy org 3 days ago

It works, thanks!

Gotta do something, but will upload conversions of the three models later today.

mrfakename

llamafy org 3 days ago

Btw @Downtown-Case are you on Discord? My username is “realmrfakename”

Downtown-Case

llamafy org 2 days ago

Yeah, I am handyconstructiongnat. Fair warning, I don't check Discord much... It doesn't even want me to login at the moment, heh.