tao-shen commited on
Commit
99c70fa
·
verified ·
1 Parent(s): 43fbed7

Upload LoRA adapter

Browse files
Files changed (1) hide show
  1. README.md +96 -10
README.md CHANGED
@@ -8,19 +8,105 @@ tags:
8
  datasets:
9
  - vicgalle/alpaca-gpt4
10
  ---
 
11
 
12
- # FlowerTune LoRA Model
 
 
 
13
 
14
- This is a LoRA adapter for meta-llama/Llama-3.1-8B-Instruct fine-tuned with Flower federated learning framework on a general NLP dataset.
15
 
16
- ## Training Details
 
 
17
 
18
- - Dataset: vicgalle/alpaca-gpt4
19
- - Training method: Federated LoRA fine-tuning with FlowerTune
20
- - Framework: Flower
21
 
22
- This model is a LoRA adapter fine-tuned on meta-llama/Llama-3.1-8B-Instruct using the Flower federated learning framework. It was trained on a general NLP dataset (vicgalle/alpaca-gpt4) through distributed learning to improve performance.
 
 
23
 
24
- ## Links
25
- - FlowerTune Homepage: [https://huggingface.co/zjudai/FlowerTune](https://huggingface.co/zjudai/FlowerTune)
26
- - FlowerTune Collection: [https://huggingface.co/collections/zjudai/flowertune-lora-collection-67ecd5d0dae6145cbf798439](https://huggingface.co/collections/zjudai/flowertune-lora-collection-67ecd5d0dae6145cbf798439)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  datasets:
9
  - vicgalle/alpaca-gpt4
10
  ---
11
+ # FlowerTune LLM on General NLP Dataset
12
 
13
+ This directory conducts federated instruction tuning with pretrained language models on a general NLP dataset [vicgalle/alpaca-gpt4](https://huggingface.co/datasets/vicgalle/alpaca-gpt4).
14
+ We use [Flower Datasets](https://flower.dev/docs/datasets/) to download, partition and preprocess the dataset.
15
+ Flower's Simulation Engine is used to simulate the LLM fine-tuning process in a federated way,
16
+ which allows users to perform the training on a single GPU.
17
 
18
+ ## Links
19
 
20
+ - **GitHub Repository**: [https://github.com/zjudai/flwr-nlp](https://github.com/zjudai/flwr-nlp)
21
+ - **Hugging Face Homepage**: [https://huggingface.co/zjudai/FlowerTune](https://huggingface.co/zjudai/FlowerTune)
22
+ - **FlowerTune Collection**: [https://huggingface.co/collections/zjudai/flowertune-lora-collection-67ecd5d0dae6145cbf798439](https://huggingface.co/collections/zjudai/flowertune-lora-collection-67ecd5d0dae6145cbf798439)
23
 
24
+ ## Experimental Setup
 
 
25
 
26
+ The dataset is divided into 20 partitions in an IID fashion, a partition is assigned to each ClientApp.
27
+ We randomly sample a fraction (0.1) of the total nodes to participate in each round, for a total of `10` rounds.
28
+ All settings are defined in `pyproject.toml`.
29
 
30
+ ## Methodology
31
+
32
+ This baseline performs federated LLM fine-tuning with [LoRA](https://arxiv.org/abs/2106.09685) using the [🤗PEFT](https://huggingface.co/docs/peft/en/index) library.
33
+ The clients' models are aggregated with `FedAvg` strategy.
34
+ This provides a baseline performance for general NLP tasks with evaluation on MMLU benchmark.
35
+
36
+ ### Example: Qwen2.5-7B-Instruct
37
+
38
+ For example, with the **Qwen/Qwen2.5-7B-Instruct** model we adopted the following fine-tuning methodology:
39
+
40
+ - **Precision**: `bf16` for model weights.
41
+ - **Quantization**: `4-bit` quantization for reduced memory usage.
42
+ - **LoRA Configuration**:
43
+ - Rank (r): `32`
44
+ - Alpha: `64`
45
+ - **Training Configuration**:
46
+ - Batch size: `8`
47
+ - Maximum number of steps: `10`
48
+ - Total number of rounds: `10`
49
+ - Fraction fit per round: `0.1`
50
+ - **Learning Rate Scheduler**:
51
+ - Maximum LR: `5e-5`
52
+ - Minimum LR: `1e-6`
53
+ - Constant learning rate scheduler over steps
54
+ - **Strategy**: `FedAvg`
55
+
56
+ ## Environment and Execution
57
+
58
+ ### Environment Setup
59
+
60
+ Project dependencies are defined in `pyproject.toml`. Install them in an activated Python environment with:
61
+
62
+ ```shell
63
+ python -m pip install --upgrade pip wheel setuptools packaging
64
+
65
+ pip install -e .
66
+ ```
67
+
68
+ ### Running the Training and Evaluation
69
+
70
+ We use a wrapper script `run_all_experiments.sh` to handle both training and evaluation processes:
71
+
72
+ ```bash
73
+ # Example of running experiments
74
+ ./run_all_experiments.sh --model Qwen/Qwen2.5-7B-Instruct --task general_nlp
75
+ ```
76
+
77
+ The wrapper script sets up the proper environment, including:
78
+ - Activating the conda environment
79
+ - Setting up proxy configurations if needed
80
+ - Executing the main experiment runner script with the provided parameters
81
+
82
+ The actual experiment workflow is implemented in `run_experiments.py`, which is called by the wrapper script.
83
+
84
+ ### Model Saving
85
+
86
+ The global PEFT model checkpoints are saved every 5 rounds after aggregation on the server side as default, which can be specified with `train.save-every-round` under [tool.flwr.app.config] entry in `pyproject.toml`.
87
+
88
+ ## Evaluation Results
89
+
90
+ The evaluation was conducted on the MMLU (Massive Multitask Language Understanding) benchmark, which tests knowledge across various domains:
91
+
92
+ | **Model** | **STEM** | **Social Sciences** | **Humanities** | **Average** |
93
+ |-----------|----------|---------------------|----------------|-------------|
94
+ | Qwen/Qwen2.5-7B-Instruct | 52.52% | 79.27% | 60.32% | 64.04% |
95
+ | Qwen/Qwen2.5-1.5B-Instruct | 47.13% | 62.30% | 50.54% | 53.32% |
96
+ | mistralai/Mistral-7B-Instruct-v0.3 | 29.94% | 54.27% | 44.93% | 43.05% |
97
+ | meta-llama/Llama-3.1-8B-Instruct | 22.87% | 39.55% | 32.05% | 31.49% |
98
+ | mistralai/Mistral-7B-v0.3 | 12.59% | 31.13% | 27.10% | 23.61% |
99
+ | TinyLlama/TinyLlama-1.1B-Chat-v1.0 | 14.18% | 21.61% | 21.91% | 19.23% |
100
+ | meta-llama/Llama-3.2-1B-Instruct | 12.88% | 17.61% | 6.16% | 12.22% |
101
+ | google/gemma-3-1b-it | 0.10% | 0.49% | 0.15% | 0.24% |
102
+ | deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | 0.54% | 0.00% | 0.04% | 0.19% |
103
+
104
+ ## Hardware Details
105
+
106
+ For this experiment, I utilized a GPU-enabled virtual machine.
107
+
108
+ | **Component** | **Specification** |
109
+ |---------------|----------------------|
110
+ | **GPU** | 1 × GPU with 16+ GB |
111
+ | **vCPUs** | 6 |
112
+ | **Memory** | 16+ GB |