loghugging25 commited on
Commit
9bb2ebc
·
verified ·
1 Parent(s): 0cb72c3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +152 -3
README.md CHANGED
@@ -1,3 +1,152 @@
1
- ---
2
- license: gemma
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gemma
3
+ language:
4
+ - de
5
+ base_model:
6
+ - google/gemma-2-27b-it
7
+ pipeline_tag: question-answering
8
+ tags:
9
+ - Connect-Transport
10
+ - Logics Software
11
+ - German support chatbot
12
+ - Deutscher KI Chatbot
13
+ - Kundenservice Chatbot
14
+ - Deutscher Chatbot
15
+ - KI-Chatbots für Unternehmen
16
+ - Chatbot for SMEs
17
+ - Question-answering
18
+ - QLoRA fine-tuning
19
+ - LLM training
20
+ library_name: transformers
21
+ ---
22
+
23
+ # Model Card for logicsct-logicsct-gemma2it27b
24
+ **logicsct-logicsct-gemma2it27b** is a QLoRA 4-bit fine-tuned version of [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it). This model has been adapted with domain-specific knowledge to serve as a support chatbot for [**Connect-Transport**](https://www.logics-connect.de), our transport management system developed at Logics Software GmbH.
25
+
26
+ While tailored for our internal use, the training principles and techniques we employed can also be applied by others interested in developing their own chatbot assistants.
27
+
28
+ We are continuously evaluating and refining our models to enhance the performance of our support chatbot for Connect-Transport.
29
+
30
+ ## Finding a Good Base Model – Proficient in German and Following Instructions
31
+ We have evaluated over 70 models for basic technical instruction tasks in German. The evaluation was carried out manually by reviewing the responses to the following questions:
32
+
33
+ - Wie kann ich in Chrome machen dass meine Downloads immer am gleichen Ort gespeichert werden?
34
+ - Wie kann ich in Outlook meine Mail Signatur anpassen und einen Link und Bild dort einfügen?
35
+
36
+ The best models according to our subjective rating scale (1 = poor, 5 = excellent) are:
37
+
38
+ 5-Star Rating:
39
+ - Big proprietary models such as OpenAI o1, OpenAI 4o and OpenAI o1-mini
40
+ - Huge models: [deepseek-ai/DeepSeek-R1 (685B)](https://huggingface.co/deepseek-ai/DeepSeek-R1), [deepseek-ai/DeepSeek-V3 (685B)](https://huggingface.co/deepseek-ai/DeepSeek-V3) and [mistralai/Mistral-Large-Instruct-2411 (123B)](https://huggingface.co/mistralai/Mistral-Large-Instruct-2411)
41
+ - Large models: [Nexusflow/Athene-V2-Chat (72.7B)](https://huggingface.co/Nexusflow/Athene-V2-Chat) and [nvidia/Llama-3.1-Nemotron-70B-Instruct (70.6B)](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct)
42
+
43
+ 4-Star Rating:
44
+ - Huge models: [mistralai/Mixtral-8x22B-Instruct-v0.1 (141B)](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1), [alpindale/WizardLM-2-8x22B (141B)](https://huggingface.co/alpindale/WizardLM-2-8x22B) and [CohereForAI/c4ai-command-r-plus-08-2024 (104B)](https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024)
45
+ - Large models: [meta-llama/Llama-3.3-70B-Instruct (70.6B)](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) and [NousResearch/Hermes-3-Llama-3.1-70B (70.6B)](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-70B)
46
+ - Big models: [mistralai/Mixtral-8x7B-Instruct-v0.1 (46.7B)](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
47
+ - Medium-sized models: [google/gemma-2-27b (27.2B)](https://huggingface.co/google/gemma-2-27b) and [mistralai/Mistral-Small-Instruct-2409 (22.2B)](https://huggingface.co/mistralai/Mistral-Small-Instruct-2409)
48
+ - **Small-Sized Models (Current Main Focus)**:
49
+ - [microsoft/phi-4 (14.7B)](https://huggingface.co/microsoft/phi-4)
50
+ - [mistralai/Mistral-Nemo-Instruct-2407 (12.2B)](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)
51
+
52
+ Models rated 3 stars or lower are not listed here. We have tested dozens of models with fewer than 20B and 10B parameters, but most do not understand or speak German well enough or perform adequately in the context of answering support chatbot technical questions.
53
+
54
+ Some models also have smaller versions that are not listed above because they did not achieve a 4+ rating. Additionally, some models (e.g., Hermes 3) have larger versions available that are not included, as their performance relative to model size was not impressive, making their massive 405B versions less interesting for our purposes.
55
+
56
+ Given our goal of training, exporting, and running inference on our dedicated server hardware, we primarily focus on models with fewer than 20B parameters while comparing their performance with that of some larger models.
57
+
58
+ ## How We Fine-Tune Our Base Model
59
+ Due to our small training dataset and GPU VRAM constraints, we exclusively use QLoRA fine-tuning. After testing our own scripts, we ultimately settled on [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory), which meets our needs for straightforward training, inference, and export functionality across a wide range of models.
60
+
61
+ ### Training Data
62
+ Our training data currently consists of approximately **220 prompt-response pairs** in a carefully curated dataset.
63
+
64
+ To collect this training data, we developed an internal web app for our employees. In this app:
65
+ - Employees enter both a prompt and a response — each pair forms a single training record.
66
+ - Gamification is implemented via daily and weekly high score systems that highlight the employee who contributes the most training data.
67
+ - The web app is also connected to a selection of current evaluation models to facilitate internal testing of how the models respond both to prompts from the training data and to new prompts.
68
+
69
+ Here are three example records of our training data:
70
+
71
+ > **Prompt**: Wie kann ich eine Tour umbenennen?
72
+ > <br>**Response**: 1. Öffnen Sie die Stoppliste der Tour und klicken Sie auf das Tourmenü (graues Zahnradsymbol ⚙️) rechts oben in der Stoppliste.\n2. Wählen Sie im Tourmenü \"Tour umbenennen\".\n3. Folgen Sie der Anweisung im Dialog und klicken Sie auf \"OK\".\nBeachten Sie: Fahrer können die Tour nicht umbenennen, sondern nur ein Disponent (Admin Account).
73
+
74
+ > **Prompt**: Wie werden Daten in Connect-Transport importiert?
75
+ > <br>**Response**: Der Flex Importer liest Auftragsdaten aus Excel-Dateien ein. Alternativ können Daten per Schnittstelle direkt aus ERP- oder Warenwirtschaftssystemen importiert werden.
76
+
77
+ > **Prompt**: Gibt es eine Möglichkeit zur Sendungsverfolgung für Endkunden?
78
+ > <br>**Response**: Ja, mit dem Avisierungs-Modul können Disponenten per SMS oder E-Mail voraussichtliche Lieferzeiten an Empfänger senden, die ihre Sendungen live verfolgen können.
79
+
80
+ ### QLoRA Settings
81
+ Full settings for `logicsct_train_gemma2it27b_qlora_sft_otfq.yaml`:
82
+ ```
83
+ ### model
84
+ model_name_or_path: google/gemma-2-27b-it
85
+ quantization_bit: 4
86
+ quantization_method: bitsandbytes
87
+ trust_remote_code: true
88
+
89
+ ### method
90
+ stage: sft
91
+ do_train: true
92
+ finetuning_type: lora
93
+ lora_rank: 128 # we still experiment with that
94
+ #lora_alpha # default lora_rank * 2
95
+ lora_target: all
96
+
97
+ ### dataset
98
+ dataset: logicsct
99
+ template: gemma
100
+ cutoff_len: 512
101
+ overwrite_cache: true
102
+ preprocessing_num_workers: 16
103
+
104
+ ### output
105
+ output_dir: saves/logicsct-gemma2it27b/lora/sft
106
+ logging_steps: 10
107
+ save_steps: 500
108
+ plot_loss: true
109
+ overwrite_output_dir: true
110
+
111
+ ### train
112
+ per_device_train_batch_size: 1
113
+ gradient_accumulation_steps: 8
114
+ learning_rate: 2.0e-4 # we still experiment with that
115
+ num_train_epochs: 4.0 # we still experiment with that
116
+ lr_scheduler_type: cosine
117
+ warmup_ratio: 0.1
118
+ bf16: true
119
+ ddp_timeout: 180000000
120
+
121
+ ### eval
122
+ val_size: 0.2 # use 20% of dataset as the validation split.
123
+ per_device_eval_batch_size: 1 # Keeps the evaluation batch size at 1 per device
124
+ eval_strategy: steps # or "epoch" if you prefer evaluating at the end of each epoch
125
+ eval_steps: 500 # adjust this if needed (e.g., if you use "steps", it determines evaluation frequency)
126
+ ```
127
+
128
+ ### Training, Inference, and Export
129
+ We follow the instructions provided in the [LLaMA-Factory Quickstart Guide](https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#quickstart):
130
+
131
+ ```
132
+ llamafactory-cli train logicsct_train_gemma2it27b_qlora_sft_otfq.yaml # 155144 MiB VRAM, 15982 MiB RAM for 4 bit QLoRA training
133
+ llamafactory-cli chat logicsct_inference_gemma2it27b_qlora_sft_otfq.yaml # 66956 MiB VRAM, 6226 MiB RAM for inference of base model + QLoRA adapter
134
+ llamafactory-cli export logicsct_export_gemma2it27b_qlora_sft.yaml # 3927 MiB VRAM, 58223 MiB RAM using cpu-mode, alternatively 66900 MiB VRAM, 8822 MiB RAM using gpu mode, for exporting a merged version of the model with its adapter
135
+ llamafactory-cli export logicsct_export_gemma2it27b_qlora_sft_Q4.yaml # 79270 MiB VRAM, 59852 MiB RAM for a 4bit quant export of the merged model
136
+ llamafactory-cli chat logicsct_inference_gemma2it27b_qlora_sft_otfq_Q4.yaml # 21294 MiB VRAM, 4678 MiB RAM for inference of the 4bit quant merged model
137
+ ```
138
+
139
+ ### Comparison of Open Source Training/Models with OpenAI Proprietary Fine-Tuning
140
+ We have fine-tuned both OpenAI GPT 4o and 4o-mini and compared their performance to that of our best small-sized models. After some initial runs with unsatisfactory results, we significantly adjusted the hyperparameters and focused primarily on experimenting with 4o-mini.
141
+
142
+ With our current training data, both 4o and 4o-mini appear to require 5 epochs using the default learning rate, with the training loss approaching zero. With fewer epochs, however, the models seem not to learn sufficiently—perhaps due to the small size of our training dataset. Significant overfitting occurs at approximately 7 epochs for both models.
143
+
144
+ Our best settings so far are:
145
+ - Epochs: 5
146
+ - Batch Size: 3
147
+ - Learning Rate: Automatically determined
148
+
149
+ Currently, our small-sized open-source models perform comparably to or even better than the fine-tuned 4o-mini. We will continue testing with OpenAI fine-tuning once we have a larger training dataset.
150
+
151
+ ## Next Steps
152
+ Our top priority at the moment is to collect more training data.