|
--- |
|
library_name: transformers |
|
tags: [] |
|
--- |
|
|
|
This repository contains the text-only LLM portion of `meta-llama/Llama-3.2-11B-Vision-Instruct` |
|
|
|
**How it was done** |
|
|
|
```python |
|
from collections import OrderedDict |
|
from transformers import MllamaForConditionalGeneration, AutoModelForCausalLM |
|
from transformers.models.mllama.modeling_mllama import MllamaCrossAttentionDecoderLayer |
|
llama32_id = "meta-llama/Llama-3.2-11B-Vision-Instruct" |
|
llama32 = MllamaForConditionalGeneration.from_pretrained( |
|
llama32_id, |
|
torch_dtype=torch.bfloat16, |
|
device_map="cuda:0", |
|
) |
|
|
|
|
|
new_layers = [] |
|
for idx, layer in enumerate(llama32.language_model.model.layers): |
|
if isinstance(layer, MllamaCrossAttentionDecoderLayer): |
|
# CrossAttention layers are only take effect when image is provided. |
|
# Ignore here since we want text-only model |
|
pass |
|
else: |
|
new_layers.append(layer) |
|
llama32.language_model.model.cross_attention_layers = [] |
|
llama32.language_model.model.layers = torch.nn.ModuleList(new_layers) |
|
|
|
|
|
# Now llama32.language_model is identical to Llama3.1-8B-Instruct, except the embedding size(+8) |
|
# see: https://github.com/huggingface/transformers/blob/a22a4378d97d06b7a1d9abad6e0086d30fdea199/src/transformers/models/mllama/modeling_mllama.py#L1667C9-L1667C26 |
|
new_llama32_state_dict = OrderedDict() |
|
for k, v in llama32.language_model.state_dict().items(): |
|
if k == "model.embed_tokens.weight": |
|
v = v[:128256, :] |
|
new_llama32_state_dict[k] = v |
|
|
|
|
|
# Load a llama31 for the architecture |
|
llama31_id = "meta-llama/Llama-3.1-8B-Instruct" |
|
llama31 = AutoModelForCausalLM.from_pretrained( |
|
llama31_id, |
|
torch_dtype=torch.bfloat16, |
|
device_map="cuda:1", |
|
) |
|
|
|
llama31.load_state_dict(new_llama32_state_dict) |
|
# <All keys matched successfully> |
|
|
|
llama31.save_pretrained("./my-cool-llama3.2") |
|
``` |
|
|
|
|
|
**Note:** |
|
|
|
In the original tokenizer, there are `date_string` in `tokenizer.chat_template` (which append the current date when calling `tokenizer.apply_chat_template(messages)`). |
|
|
|
I removed this behavior in this repo. Please be aware when you use `AutoTokenizer.from_pretrained(this_repo)`. |
|
|