kehanlu
/

llama-3.2-8B-Instruct

Text Generation

text-generation-inference

Model card Files Files and versions Community

llama-3.2-8B-Instruct / README.md

kehanlu's picture

Update README.md

03fe78a verified about 2 months ago

|

history blame contribute delete

2.12 kB

	---
	library_name: transformers
	tags: []
	---

	This repository contains the text-only LLM portion of `meta-llama/Llama-3.2-11B-Vision-Instruct`

	How it was done

	```python
	from collections import OrderedDict
	from transformers import MllamaForConditionalGeneration, AutoModelForCausalLM
	from transformers.models.mllama.modeling_mllama import MllamaCrossAttentionDecoderLayer
	llama32_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
	llama32 = MllamaForConditionalGeneration.from_pretrained(
	llama32_id,
	torch_dtype=torch.bfloat16,
	device_map="cuda:0",
	)


	new_layers = []
	for idx, layer in enumerate(llama32.language_model.model.layers):
	if isinstance(layer, MllamaCrossAttentionDecoderLayer):
	# CrossAttention layers are only take effect when image is provided.
	# Ignore here since we want text-only model
	pass
	else:
	new_layers.append(layer)
	llama32.language_model.model.cross_attention_layers = []
	llama32.language_model.model.layers = torch.nn.ModuleList(new_layers)


	# Now llama32.language_model is identical to Llama3.1-8B-Instruct, except the embedding size(+8)
	# see: https://github.com/huggingface/transformers/blob/a22a4378d97d06b7a1d9abad6e0086d30fdea199/src/transformers/models/mllama/modeling_mllama.py#L1667C9-L1667C26
	new_llama32_state_dict = OrderedDict()
	for k, v in llama32.language_model.state_dict().items():
	if k == "model.embed_tokens.weight":
	v = v[:128256, :]
	new_llama32_state_dict[k] = v


	# Load a llama31 for the architecture
	llama31_id = "meta-llama/Llama-3.1-8B-Instruct"
	llama31 = AutoModelForCausalLM.from_pretrained(
	llama31_id,
	torch_dtype=torch.bfloat16,
	device_map="cuda:1",
	)

	llama31.load_state_dict(new_llama32_state_dict)
	# <All keys matched successfully>

	llama31.save_pretrained("./my-cool-llama3.2")
	```


	Note:

	In the original tokenizer, there are `date_string` in `tokenizer.chat_template` (which append the current date when calling `tokenizer.apply_chat_template(messages)`).

	I removed this behavior in this repo. Please be aware when you use `AutoTokenizer.from_pretrained(this_repo)`.