zjudai
/

flowertune-general-nlp-lora-llama-3.1-8b-instruct

PEFT

Safetensors

lora

federated-learning

flower

Model card Files Files and versions Community

tao-shen commited on Apr 2

Commit

99c70fa

verified ·

1 Parent(s): 43fbed7

Upload LoRA adapter

Browse files

Files changed (1) hide show

README.md +96 -10

README.md CHANGED Viewed

@@ -8,19 +8,105 @@ tags:
 datasets:
 - vicgalle/alpaca-gpt4
 ---
-# FlowerTune LoRA Model
-This is a LoRA adapter for meta-llama/Llama-3.1-8B-Instruct fine-tuned with Flower federated learning framework on a general NLP dataset.
-## Training Details
-- Dataset: vicgalle/alpaca-gpt4
-- Training method: Federated LoRA fine-tuning with FlowerTune
-- Framework: Flower
-This model is a LoRA adapter fine-tuned on meta-llama/Llama-3.1-8B-Instruct using the Flower federated learning framework. It was trained on a general NLP dataset (vicgalle/alpaca-gpt4) through distributed learning to improve performance.
-## Links
-- FlowerTune Homepage: [https://huggingface.co/zjudai/FlowerTune](https://huggingface.co/zjudai/FlowerTune)
-- FlowerTune Collection: [https://huggingface.co/collections/zjudai/flowertune-lora-collection-67ecd5d0dae6145cbf798439](https://huggingface.co/collections/zjudai/flowertune-lora-collection-67ecd5d0dae6145cbf798439)

 datasets:
 - vicgalle/alpaca-gpt4
 ---
+# FlowerTune LLM on General NLP Dataset
+This directory conducts federated instruction tuning with pretrained language models on a general NLP dataset [vicgalle/alpaca-gpt4](https://huggingface.co/datasets/vicgalle/alpaca-gpt4).
+We use [Flower Datasets](https://flower.dev/docs/datasets/) to download, partition and preprocess the dataset.
+Flower's Simulation Engine is used to simulate the LLM fine-tuning process in a federated way,
+which allows users to perform the training on a single GPU.
+## Links
+- **GitHub Repository**: [https://github.com/zjudai/flwr-nlp](https://github.com/zjudai/flwr-nlp)
+- **Hugging Face Homepage**: [https://huggingface.co/zjudai/FlowerTune](https://huggingface.co/zjudai/FlowerTune)
+- **FlowerTune Collection**: [https://huggingface.co/collections/zjudai/flowertune-lora-collection-67ecd5d0dae6145cbf798439](https://huggingface.co/collections/zjudai/flowertune-lora-collection-67ecd5d0dae6145cbf798439)
+## Experimental Setup
+The dataset is divided into 20 partitions in an IID fashion, a partition is assigned to each ClientApp.
+We randomly sample a fraction (0.1) of the total nodes to participate in each round, for a total of `10` rounds.
+All settings are defined in `pyproject.toml`.
+## Methodology
+This baseline performs federated LLM fine-tuning with [LoRA](https://arxiv.org/abs/2106.09685) using the [🤗PEFT](https://huggingface.co/docs/peft/en/index) library.
+The clients' models are aggregated with `FedAvg` strategy.
+This provides a baseline performance for general NLP tasks with evaluation on MMLU benchmark.
+### Example: Qwen2.5-7B-Instruct
+For example, with the **Qwen/Qwen2.5-7B-Instruct** model we adopted the following fine-tuning methodology:
+- **Precision**: `bf16` for model weights.
+- **Quantization**: `4-bit` quantization for reduced memory usage.
+- **LoRA Configuration**:
+  - Rank (r): `32`
+  - Alpha: `64`
+- **Training Configuration**:
+  - Batch size: `8`
+  - Maximum number of steps: `10`
+  - Total number of rounds: `10`
+  - Fraction fit per round: `0.1`
+- **Learning Rate Scheduler**:
+  - Maximum LR: `5e-5`
+  - Minimum LR: `1e-6`
+  - Constant learning rate scheduler over steps
+- **Strategy**: `FedAvg`
+## Environment and Execution
+### Environment Setup
+Project dependencies are defined in `pyproject.toml`. Install them in an activated Python environment with:
+```shell
+python -m pip install --upgrade pip wheel setuptools packaging
+pip install -e .
+```
+### Running the Training and Evaluation
+We use a wrapper script `run_all_experiments.sh` to handle both training and evaluation processes:
+```bash
+# Example of running experiments
+./run_all_experiments.sh --model Qwen/Qwen2.5-7B-Instruct --task general_nlp
+```
+The wrapper script sets up the proper environment, including:
+- Activating the conda environment
+- Setting up proxy configurations if needed
+- Executing the main experiment runner script with the provided parameters
+The actual experiment workflow is implemented in `run_experiments.py`, which is called by the wrapper script.
+### Model Saving
+The global PEFT model checkpoints are saved every 5 rounds after aggregation on the server side as default, which can be specified with `train.save-every-round` under [tool.flwr.app.config] entry in `pyproject.toml`.
+## Evaluation Results
+The evaluation was conducted on the MMLU (Massive Multitask Language Understanding) benchmark, which tests knowledge across various domains:
+| **Model** | **STEM** | **Social Sciences** | **Humanities** | **Average** |
+|-----------|----------|---------------------|----------------|-------------|
+| Qwen/Qwen2.5-7B-Instruct | 52.52% | 79.27% | 60.32% | 64.04% |
+| Qwen/Qwen2.5-1.5B-Instruct | 47.13% | 62.30% | 50.54% | 53.32% |
+| mistralai/Mistral-7B-Instruct-v0.3 | 29.94% | 54.27% | 44.93% | 43.05% |
+| meta-llama/Llama-3.1-8B-Instruct | 22.87% | 39.55% | 32.05% | 31.49% |
+| mistralai/Mistral-7B-v0.3 | 12.59% | 31.13% | 27.10% | 23.61% |
+| TinyLlama/TinyLlama-1.1B-Chat-v1.0 | 14.18% | 21.61% | 21.91% | 19.23% |
+| meta-llama/Llama-3.2-1B-Instruct | 12.88% | 17.61% | 6.16% | 12.22% |
+| google/gemma-3-1b-it | 0.10% | 0.49% | 0.15% | 0.24% |
+| deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | 0.54% | 0.00% | 0.04% | 0.19% |
+## Hardware Details
+For this experiment, I utilized a GPU-enabled virtual machine.
+| **Component** | **Specification**    |
+|---------------|----------------------|
+| **GPU**       | 1 × GPU with 16+ GB  |
+| **vCPUs**     | 6                    |
+| **Memory**    | 16+ GB               |