File size: 2,119 Bytes
9f1c343
 
 
 
 
1cb2c15
9f1c343
1cb2c15
9f1c343
1cb2c15
 
 
 
 
 
 
 
 
 
9f1c343
 
1cb2c15
 
 
 
 
 
 
 
 
 
9f1c343
 
1cb2c15
 
 
 
 
 
 
9f1c343
 
1cb2c15
 
 
 
 
 
 
9f1c343
1cb2c15
 
9f1c343
1cb2c15
 
9f1c343
 
1cb2c15
9f1c343
03fe78a
9f1c343
03fe78a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
---
library_name: transformers
tags: []
---

This repository contains the text-only LLM portion of `meta-llama/Llama-3.2-11B-Vision-Instruct`

**How it was done**

```python
from collections import OrderedDict
from transformers import MllamaForConditionalGeneration, AutoModelForCausalLM
from transformers.models.mllama.modeling_mllama import MllamaCrossAttentionDecoderLayer
llama32_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
llama32 = MllamaForConditionalGeneration.from_pretrained(
    llama32_id,
    torch_dtype=torch.bfloat16,
    device_map="cuda:0",
)


new_layers = []
for idx, layer in enumerate(llama32.language_model.model.layers):
    if isinstance(layer, MllamaCrossAttentionDecoderLayer):
        # CrossAttention layers are only take effect when image is provided.
        # Ignore here since we want text-only model
        pass
    else:
        new_layers.append(layer)
llama32.language_model.model.cross_attention_layers = []
llama32.language_model.model.layers = torch.nn.ModuleList(new_layers)


# Now llama32.language_model is identical to Llama3.1-8B-Instruct, except the embedding size(+8)
# see: https://github.com/huggingface/transformers/blob/a22a4378d97d06b7a1d9abad6e0086d30fdea199/src/transformers/models/mllama/modeling_mllama.py#L1667C9-L1667C26
new_llama32_state_dict = OrderedDict()
for k, v in llama32.language_model.state_dict().items():
    if k == "model.embed_tokens.weight":
        v = v[:128256, :]
    new_llama32_state_dict[k] = v


# Load a llama31 for the architecture
llama31_id = "meta-llama/Llama-3.1-8B-Instruct"
llama31 = AutoModelForCausalLM.from_pretrained(
    llama31_id,
    torch_dtype=torch.bfloat16,
    device_map="cuda:1",
)

llama31.load_state_dict(new_llama32_state_dict)
# <All keys matched successfully>

llama31.save_pretrained("./my-cool-llama3.2")
```


**Note:**

In the original tokenizer, there are `date_string` in `tokenizer.chat_template` (which append the current date when calling `tokenizer.apply_chat_template(messages)`).

I removed this behavior in this repo. Please be aware when you use `AutoTokenizer.from_pretrained(this_repo)`.