tu-ericngo commited on
Commit
3c7c34d
·
verified ·
1 Parent(s): 7a37724

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -11
README.md CHANGED
@@ -75,25 +75,21 @@ Data for the fine-tuning comes from 2 sources: (1) mannual collection and (2) sy
75
  The data is structured in an Alpaca format, with each training example consisting of Prompt (description of task, JSON schema, and one-shot example), Input (an elite's biographical text), and Output (JSON record).
76
 
77
  ### Training Procedure
78
-
79
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
80
 
81
-
82
  #### Training Hyperparameters
83
 
84
  - **Training regime:** <!-- bf16 non-mixed precision --> bf16 non-mixed precision
85
 
86
  #### Speeds, Sizes, Times [optional]
87
-
88
  <!-- This section provides information about throughput, start/end time, etc. -->
89
-
90
- Num Epochs = 3 | Total steps = 99
91
- Batch size per device = 2 | Gradient accumulation steps = 4
92
- Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
93
- Trainable parameters = 83,886,080/8,000,000,000 (1.05% trained)
94
- 38.48 minutes used for training.
95
- Peak reserved memory = 10.107 GB.
96
- Peak reserved memory for training = 4.189 GB.
97
 
98
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/67b5c53dd6a178c46d7f3767/mARFkSRyxxliZXLyc36kt.png)
99
 
 
75
  The data is structured in an Alpaca format, with each training example consisting of Prompt (description of task, JSON schema, and one-shot example), Input (an elite's biographical text), and Output (JSON record).
76
 
77
  ### Training Procedure
 
78
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
79
 
 
80
  #### Training Hyperparameters
81
 
82
  - **Training regime:** <!-- bf16 non-mixed precision --> bf16 non-mixed precision
83
 
84
  #### Speeds, Sizes, Times [optional]
 
85
  <!-- This section provides information about throughput, start/end time, etc. -->
86
+ - Num Epochs = 3 | Total steps = 99
87
+ - Batch size per device = 2 | Gradient accumulation steps = 4
88
+ - Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
89
+ - Trainable parameters = 83,886,080/8,000,000,000 (1.05% trained)
90
+ - 38.48 minutes used for training.
91
+ - Peak reserved memory = 10.107 GB.
92
+ - Peak reserved memory for training = 4.189 GB.
 
93
 
94
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/67b5c53dd6a178c46d7f3767/mARFkSRyxxliZXLyc36kt.png)
95