Study Group [Accountability, Discussions, Resources and Common Doubt Resolutions]

#4
by KanishkNoir - opened

Hey guys I am starting this study group if you want to learn together rather than being alone in finishing the course.
Key Points for this group:

  1. You can ask fellow members to motivate you in case you procrastinate a lot (like me).
  2. If some topic is interesting and you find extra resources then you can share with the group.
  3. Resolution of common doubts that we face during learning.
  4. We can add more points later :p
    download (5).jpeg

Leave your discord in comment or directly take part through here!

KanishkNoir changed discussion status to closed
KanishkNoir changed discussion status to open
a smol course org

Nice work @KanishkNoir . Feel free to share any practical information like weekly calls or check-ins.

Module: Instruction Tuning
Section: Chat Template

When running text-generation if your response has <think> token then you can switch it to standard response by disabling the enable _thinking param in the tokenizer!
Reason: The tokenizer that we are using right now has the thinking mode enabled by default :)


tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B")

pipe = pipeline("text-generation", "HuggingFaceTB/SmolLM3-3B", tokenizer=tokenizer, device_map="auto")

messages = [
    {"role": "system", "content": "You are a angry chatbot that responds in the style of a wild west cowboy."},
    {"role": "user", "content": "Hello, how are you?"}
]

# Apply the chat template with thinking disabled
thinking_disabled_chat = tokenizer.apply_chat_template(
    messages, 
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False
)

# Use the formatted template in pipeline
response = pipe(thinking_disabled_chat, max_new_tokens=128, temperature=0.7, return_full_text=False)


print(response[0]['generated_text'])

@KanishkNoir Hello!

You can also add a tools parameters and pass tool instructions into the tokenizer, wrapping tool instructions with tool tokens! The idea is to leverage AutoTokenizers as a templating engine to make tool calls more deterministic by using tokens a model has seen in post training, or fine tuning. Leveraging jinja templates wherever possible is pretty much necessary to reproduce performance across tasks. Of course, this can be achieved with prompts. There is also a parameter for documents in rag scenarios.

I haven't tested these yet but have been researching as I plan an implementation of something like the OpenAI Responses API.

thinking_disabled_chat = tokenizer.apply_chat_template(
    messages, 
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False
    tools=tools
)

Another interesting feature I finally leveraged recently was

return_tensors="np"

which returns tokens as a numpy array, very useful for manipulation. I do a lot of work with OpenVINO, and wrapping the output in ov.Tensor makes AutoTokenizers usable with OpenVINO runtime outside of Optimum-Intel. A bit off topic, but related to apply_chat_template.

        prompt_token_ids = self.encoder_tokenizer.apply_chat_template(
            messages, 
            add_generation_prompt=True,
            skip_special_tokens=True,
            return_tensors="np"
            )
        return ov.Tensor(prompt_token_ids)

Very nice. Gets around jinja errors in OpenVINO GenAI and adds enormous flexibility.

Bonus, because nerd sniping: This took a long time for me to figure out. Another class, AutoProcessor, can be used for multimodal input; it handles formatting of images and others with lots of flexibility across suppoted architectures. For chat scenario with models that support image input in Transformers we do not need to pass an image with every prompt, like we see in ChatGPT, Gemini, Claude, etc, even though models are multimodal capable. So I came up with this snippet to use in a FastAPI application where we expect message to contain a base64 image and be formatted in role/content key. Maybe there are simpler ways to do this, but here was my solution, which I cleaned up to share:

# Iterate over the messages list 
for message in messages:
    # Handle multimodal messages: check if "content" is a list
    if isinstance(message.get("content", ""), list):
        text_parts = []

        for content_item in message["content"]:
            # Case 1: Image content
            if isinstance(content_item, dict) and content_item.get("type") == "image_url":
                image_url = content_item.get("image_url", {})

                # Check if the image is embedded as base64
                if isinstance(image_url, dict) and image_url.get("url", "").startswith("data:image/"):
                    base64_data = image_url["url"].split(",", 1)
                    if len(base64_data) > 1:
                        # Decode base64 string into binary
                        image_data = base64.b64decode(base64_data[1])

                        # Convert binary into a PIL image (force RGB mode)
                        image = Image.open(BytesIO(image_data)).convert("RGB")
                        images.append(image)

            # Case 2: Text content
            elif isinstance(content_item, dict) and content_item.get("type") == "text":
                text_parts.append(content_item.get("text", ""))

        # Build a cleaned message object containing only text
        if text_parts:
            text_message = message.copy()
            text_message["content"] = " ".join(text_parts)
            text_conversation.append(text_message)
        else:
            # Even if no text, keep the message with empty content
            text_message = message.copy()
            text_message["content"] = ""
            text_conversation.append(text_message)

    else:
        # If "content" is not a list, append the raw message as-is
        text_conversation.append(message)

# Apply the processor’s chat template using messages
text_prompt = self.processor.apply_chat_template(
    messages,
    add_generation_prompt=True
)

# Prepare processor inputs depending on whether images were found
if images:
    inputs = self.processor(
        text=[text_prompt],
        images=[images],
        padding=True,
        return_tensors="pt",
        add_generation_prompt=True
    )
else: # if text was found without images, pass inputs forward with empty image tokens
    inputs = self.processor(
        text=[text_prompt],
        padding=True,
        return_tensors="pt",
        add_generation_prompt=True
    )

@Echo9Zulu Hey, that was a really cool read! I will try to experiment with this as I progress with the course and off-course too! Really intrigued about the return_tensors = "np" part. Will try it later to see where can I leverage it :)
Thanks for the input and these additional theory!

Sign up or log in to comment