djstrong's picture
Update README.md
0ca0545 verified
metadata
license: apache-2.0
base_model: speakleash/Bielik-4.5B-v3
language:
  - pl
library_name: transformers
tags:
  - finetuned
inference:
  parameters:
    temperature: 0.4
widget:
  - messages:
      - role: user
        content: Co przedstawia polskie god艂o?
extra_gated_description: >-
  If you want to learn more about how you can use the model, please refer to our
  <a href="https://bielik.ai/terms/">Terms of Use</a>.

Bielik-4.5B-v3-Instruct

Bielik-4.5B-v3-Instruct is a generative text model featuring 4.6 billion parameters. It is an instruct fine-tuned version of the Bielik-4.5B-v3. Forementioned model stands as a testament to the unique collaboration between the open-science/open-souce project SpeakLeash and the High Performance Computing (HPC) center: ACK Cyfronet AGH. Developed and trained on Polish text corpora, which has been cherry-picked and processed by the SpeakLeash team, this endeavor leverages Polish large-scale computing infrastructure, specifically within the PLGrid environment, and more precisely, the HPC centers: ACK Cyfronet AGH. The creation and training of the Bielik-4.5B-v3-Instruct was propelled by the support of computational grant number PLG/2024/017214 and PLG/2025/018338, conducted on the Athena and Helios supercomputer, enabling the use of cutting-edge technology and computational resources essential for large-scale machine learning processes. As a result, the model exhibits an exceptional ability to understand and process the Polish language, providing accurate responses and performing a variety of linguistic tasks with high precision.

馃摎 Technical report: https://arxiv.org/abs/2505.02550

Model

The SpeakLeash team is working on their own set of instructions in Polish, which is continuously being expanded and refined by annotators. A portion of these instructions, which had been manually verified and corrected, has been utilized for training purposes. Moreover, due to the limited availability of high-quality instructions in Polish, synthetic instructions were generated with Bielik 11B v2.3 and used in training. The dataset used for training comprised over 19 million instructions, consisting of more than 12 billion tokens.

To align the model with user preferences we tested many different techniques: DPO, PPO, KTO, SiMPO. Finally the DPO-Positive method was employed, utilizing both generated and manually corrected examples, which were scored by a metamodel. A dataset comprising over 111,000 examples of varying lengths to address different aspects of response style. It was filtered and evaluated by the reward model to select instructions with the right level of difference between chosen and rejected. The novelty introduced in DPO-P was multi-turn conversations introduction.

Bielik instruct models have been trained with the use of an original open source framework called ALLaMo implemented by Krzysztof Ociepa. This framework allows users to train language models with architecture similar to LLaMA and Mistral in fast and efficient way.

Model description:

Chat template

Bielik-4.5B-v3-Instruct uses ChatML as the prompt format.

E.g.

prompt = "<s><|im_start|> user\nJakie mamy pory roku?<|im_end|> \n<|im_start|> assistant\n"
completion = "W Polsce mamy 4 pory roku: wiosna, lato, jesie艅 i zima.<|im_end|> \n"

This format is available as a chat template via the apply_chat_template() method:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # the device to load the model onto

model_name = "speakleash/Bielik-4.5B-v3-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)

messages = [
    {"role": "system", "content": "Odpowiadaj kr贸tko, precyzyjnie i wy艂膮cznie w j臋zyku polskim."},
    {"role": "user", "content": "Jakie mamy pory roku w Polsce?"},
    {"role": "assistant", "content": "W Polsce mamy 4 pory roku: wiosna, lato, jesie艅 i zima."},
    {"role": "user", "content": "Kt贸ra jest najcieplejsza?"}
]

input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = input_ids.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

Fully formated input conversation by apply_chat_template from previous example:

<s><|im_start|> system
Odpowiadaj kr贸tko, precyzyjnie i wy艂膮cznie w j臋zyku polskim.<|im_end|> 
<|im_start|> user
Jakie mamy pory roku w Polsce?<|im_end|> 
<|im_start|> assistant
W Polsce mamy 4 pory roku: wiosna, lato, jesie艅 i zima.<|im_end|> 
<|im_start|> user
Kt贸ra jest najcieplejsza?<|im_end|>

Limitations and Biases

Bielik-4.5B-v3-Instruct is a quick demonstration that the base model can be easily fine-tuned to achieve compelling and promising performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community in ways to make the model respect guardrails, allowing for deployment in environments requiring moderated outputs.

Bielik-4.5B-v3-Instruct can produce factually incorrect output, and should not be relied on to produce factually accurate data. Bielik-4.5B-v3-Instruct was trained on various public datasets. While great efforts have been taken to clear the training data, it is possible that this model can generate lewd, false, biased or otherwise offensive outputs.

Citation

Please cite this model using the following format:

@misc{ociepa2025bielikv3smalltechnical,
      title={Bielik v3 Small: Technical Report}, 
      author={Krzysztof Ociepa and 艁ukasz Flis and Remigiusz Kinas and Krzysztof Wr贸bel and Adrian Gwo藕dziej},
      year={2025},
      eprint={2505.02550},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.02550}, 
}

@misc{Bielik45Bv3i,
    title     = {Bielik-4.5B-v3-Instruct model card},
    author    = {Ociepa, Krzysztof and Flis, 艁ukasz and Kinas, Remigiusz and Gwo藕dziej, Adrian and Wr贸bel, Krzysztof and {SpeakLeash Team} and {Cyfronet Team}},
    year      = {2025},
    url       = {https://huggingface.co/speakleash/Bielik-4.5B-v3-Instruct},
    note      = {Accessed: 2025-05-06}, % change this date
    urldate   = {2025-05-06} % change this date
}

Responsible for training the model

  • Krzysztof OciepaSpeakLeash - team leadership, conceptualizing, data preparation, process optimization and oversight of training
  • 艁ukasz FlisCyfronet AGH - coordinating and supervising the training
  • Remigiusz KinasSpeakLeash - conceptualizing, coordinating RL trainings, data preparation, benchmarking and quantizations
  • Adrian Gwo藕dziejSpeakLeash - data preparation and ensuring data quality
  • Krzysztof Wr贸belSpeakLeash - benchmarks

The model could not have been created without the commitment and work of the entire SpeakLeash team, whose contribution is invaluable. Thanks to the hard work of many individuals, it was possible to gather a large amount of content in Polish and establish collaboration between the open-science SpeakLeash project and the HPC center: ACK Cyfronet AGH. Individuals who contributed to the creation of the model: Sebastian Kondracki, Igor Ciuciura, Szymon Baczy艅ski, Jacek Chwi艂a, Dominika Basaj, Kuba So艂tys, Karol Jezierski, Anna Przyby艂, Agnieszka Ratajska, Witold Wydma艅ski, Izabela Babis, Nina Babis.

Members of the ACK Cyfronet AGH team providing valuable support and expertise: Szymon Mazurek, Marek Magry艣, Mieszko Cholewa .

We gratefully acknowledge Polish high-performance computing infrastructure PLGrid (HPC Center: ACK Cyfronet AGH) for providing computer facilities and support within computational grant no. PLG/2024/017214 and PLG/2025/018338.

Contact Us

If you have any questions or suggestions, please use the discussion tab. If you want to contact us directly, join our Discord SpeakLeash.