logicssoftwaregmbh
/

logicsct-gemma2it27b

+---
+license: gemma
+language:
+- de
+base_model:
+- google/gemma-2-27b-it
+pipeline_tag: question-answering
+tags:
+- Connect-Transport
+- Logics Software
+- German support chatbot
+- Deutscher KI Chatbot
+- Kundenservice Chatbot
+- Deutscher Chatbot
+- KI-Chatbots für Unternehmen
+- Chatbot for SMEs
+- Question-answering
+- QLoRA fine-tuning
+- LLM training
+library_name: transformers
+---
+# Model Card for logicsct-logicsct-gemma2it27b
+**logicsct-logicsct-gemma2it27b** is a QLoRA 4-bit fine-tuned version of [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it). This model has been adapted with domain-specific knowledge to serve as a support chatbot for [**Connect-Transport**](https://www.logics-connect.de), our transport management system developed at Logics Software GmbH.
+While tailored for our internal use, the training principles and techniques we employed can also be applied by others interested in developing their own chatbot assistants.
+We are continuously evaluating and refining our models to enhance the performance of our support chatbot for Connect-Transport.
+## Finding a Good Base Model – Proficient in German and Following Instructions
+We have evaluated over 70 models for basic technical instruction tasks in German. The evaluation was carried out manually by reviewing the responses to the following questions:
+- Wie kann ich in Chrome machen dass meine Downloads immer am gleichen Ort gespeichert werden?
+- Wie kann ich in Outlook meine Mail Signatur anpassen und einen Link und Bild dort einfügen?
+The best models according to our subjective rating scale (1 = poor, 5 = excellent) are:
+5-Star Rating:
+- Big proprietary models such as OpenAI o1, OpenAI 4o and OpenAI o1-mini
+- Huge models: [deepseek-ai/DeepSeek-R1 (685B)](https://huggingface.co/deepseek-ai/DeepSeek-R1), [deepseek-ai/DeepSeek-V3 (685B)](https://huggingface.co/deepseek-ai/DeepSeek-V3) and [mistralai/Mistral-Large-Instruct-2411 (123B)](https://huggingface.co/mistralai/Mistral-Large-Instruct-2411)
+- Large models: [Nexusflow/Athene-V2-Chat (72.7B)](https://huggingface.co/Nexusflow/Athene-V2-Chat) and [nvidia/Llama-3.1-Nemotron-70B-Instruct (70.6B)](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct)
+4-Star Rating:
+- Huge models: [mistralai/Mixtral-8x22B-Instruct-v0.1 (141B)](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1),  [alpindale/WizardLM-2-8x22B (141B)](https://huggingface.co/alpindale/WizardLM-2-8x22B) and [CohereForAI/c4ai-command-r-plus-08-2024 (104B)](https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024)
+- Large models: [meta-llama/Llama-3.3-70B-Instruct (70.6B)](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) and [NousResearch/Hermes-3-Llama-3.1-70B (70.6B)](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-70B)
+- Big models: [mistralai/Mixtral-8x7B-Instruct-v0.1 (46.7B)](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
+- Medium-sized models: [google/gemma-2-27b (27.2B)](https://huggingface.co/google/gemma-2-27b) and [mistralai/Mistral-Small-Instruct-2409 (22.2B)](https://huggingface.co/mistralai/Mistral-Small-Instruct-2409)
+- **Small-Sized Models (Current Main Focus)**:
+  - [microsoft/phi-4 (14.7B)](https://huggingface.co/microsoft/phi-4)
+  - [mistralai/Mistral-Nemo-Instruct-2407 (12.2B)](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)
+Models rated 3 stars or lower are not listed here. We have tested dozens of models with fewer than 20B and 10B parameters, but most do not understand or speak German well enough or perform adequately in the context of answering support chatbot technical questions.
+Some models also have smaller versions that are not listed above because they did not achieve a 4+ rating. Additionally, some models (e.g., Hermes 3) have larger versions available that are not included, as their performance relative to model size was not impressive, making their massive 405B versions less interesting for our purposes.
+Given our goal of training, exporting, and running inference on our dedicated server hardware, we primarily focus on models with fewer than 20B parameters while comparing their performance with that of some larger models.
+## How We Fine-Tune Our Base Model
+Due to our small training dataset and GPU VRAM constraints, we exclusively use QLoRA fine-tuning. After testing our own scripts, we ultimately settled on [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory), which meets our needs for straightforward training, inference, and export functionality across a wide range of models.
+### Training Data
+Our training data currently consists of approximately **220 prompt-response pairs** in a carefully curated dataset.
+To collect this training data, we developed an internal web app for our employees. In this app:
+- Employees enter both a prompt and a response — each pair forms a single training record.
+- Gamification is implemented via daily and weekly high score systems that highlight the employee who contributes the most training data.
+- The web app is also connected to a selection of current evaluation models to facilitate internal testing of how the models respond both to prompts from the training data and to new prompts.
+Here are three example records of our training data:
+> **Prompt**: Wie kann ich eine Tour umbenennen?
+> <br>**Response**: 1. Öffnen Sie die Stoppliste der Tour und klicken Sie auf das Tourmenü (graues Zahnradsymbol ⚙️) rechts oben in der Stoppliste.\n2. Wählen Sie im Tourmenü \"Tour umbenennen\".\n3. Folgen Sie der Anweisung im Dialog und klicken Sie auf \"OK\".\nBeachten Sie: Fahrer können die Tour nicht umbenennen, sondern nur ein Disponent (Admin Account).
+> **Prompt**: Wie werden Daten in Connect-Transport importiert?
+> <br>**Response**: Der Flex Importer liest Auftragsdaten aus Excel-Dateien ein. Alternativ können Daten per Schnittstelle direkt aus ERP- oder Warenwirtschaftssystemen importiert werden.
+> **Prompt**: Gibt es eine Möglichkeit zur Sendungsverfolgung für Endkunden?
+> <br>**Response**: Ja, mit dem Avisierungs-Modul können Disponenten per SMS oder E-Mail voraussichtliche Lieferzeiten an Empfänger senden, die ihre Sendungen live verfolgen können.
+### QLoRA Settings
+Full settings for `logicsct_train_gemma2it27b_qlora_sft_otfq.yaml`:
+```
+### model
+model_name_or_path: google/gemma-2-27b-it
+quantization_bit: 4
+quantization_method: bitsandbytes
+trust_remote_code: true
+### method
+stage: sft
+do_train: true
+finetuning_type: lora
+lora_rank: 128 # we still experiment with that
+#lora_alpha # default lora_rank * 2
+lora_target: all
+### dataset
+dataset: logicsct
+template: gemma
+cutoff_len: 512
+overwrite_cache: true
+preprocessing_num_workers: 16
+### output
+output_dir: saves/logicsct-gemma2it27b/lora/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+### train
+per_device_train_batch_size: 1
+gradient_accumulation_steps: 8
+learning_rate: 2.0e-4   # we still experiment with that
+num_train_epochs: 4.0   # we still experiment with that
+lr_scheduler_type: cosine
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+### eval
+val_size: 0.2         # use 20% of dataset as the validation split.
+per_device_eval_batch_size: 1 # Keeps the evaluation batch size at 1 per device
+eval_strategy: steps  # or "epoch" if you prefer evaluating at the end of each epoch
+eval_steps: 500       # adjust this if needed (e.g., if you use "steps", it determines evaluation frequency)
+```
+### Training, Inference, and Export
+We follow the instructions provided in the [LLaMA-Factory Quickstart Guide](https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#quickstart):
+```
+llamafactory-cli train logicsct_train_gemma2it27b_qlora_sft_otfq.yaml       # 155144 MiB VRAM, 15982 MiB RAM for 4 bit QLoRA training
+llamafactory-cli chat logicsct_inference_gemma2it27b_qlora_sft_otfq.yaml    #  66956 MiB VRAM,  6226 MiB RAM for inference of base model + QLoRA adapter
+llamafactory-cli export logicsct_export_gemma2it27b_qlora_sft.yaml          #   3927 MiB VRAM, 58223 MiB RAM using cpu-mode, alternatively 66900 MiB VRAM, 8822 MiB RAM using gpu mode, for exporting a merged version of the model with its adapter
+llamafactory-cli export logicsct_export_gemma2it27b_qlora_sft_Q4.yaml       #  79270 MiB VRAM, 59852 MiB RAM for a 4bit quant export of the merged model
+llamafactory-cli chat logicsct_inference_gemma2it27b_qlora_sft_otfq_Q4.yaml #  21294 MiB VRAM,  4678 MiB RAM for inference of the 4bit quant merged model
+```
+### Comparison of Open Source Training/Models with OpenAI Proprietary Fine-Tuning
+We have fine-tuned both OpenAI GPT 4o and 4o-mini and compared their performance to that of our best small-sized models. After some initial runs with unsatisfactory results, we significantly adjusted the hyperparameters and focused primarily on experimenting with 4o-mini.
+With our current training data, both 4o and 4o-mini appear to require 5 epochs using the default learning rate, with the training loss approaching zero. With fewer epochs, however, the models seem not to learn sufficiently—perhaps due to the small size of our training dataset. Significant overfitting occurs at approximately 7 epochs for both models.
+Our best settings so far are:
+  - Epochs: 5
+  - Batch Size: 3
+  - Learning Rate: Automatically determined
+Currently, our small-sized open-source models perform comparably to or even better than the fine-tuned 4o-mini. We will continue testing with OpenAI fine-tuning once we have a larger training dataset.
+## Next Steps
+Our top priority at the moment is to collect more training data.