tu-ericngo
/

llama-3.1-8B-StructuredIE

Text Generation

text-generation-inference

Model card Files Files and versions Community

tu-ericngo commited on 14 days ago

Commit

3c7c34d

·

verified ·

1 Parent(s): 7a37724

Update README.md

Files changed (1) hide show

README.md +7 -11

README.md CHANGED Viewed

@@ -75,25 +75,21 @@ Data for the fine-tuning comes from 2 sources: (1) mannual collection and (2) sy
 The data is structured in an Alpaca format, with each training example consisting of Prompt (description of task, JSON schema, and one-shot example), Input (an elite's biographical text), and Output (JSON record).
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 #### Training Hyperparameters
 - **Training regime:** <!-- bf16 non-mixed precision --> bf16 non-mixed precision
 #### Speeds, Sizes, Times [optional]
 <!-- This section provides information about throughput, start/end time, etc. -->
-Num Epochs = 3 | Total steps = 99
-Batch size per device = 2 | Gradient accumulation steps = 4
-Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
-Trainable parameters = 83,886,080/8,000,000,000 (1.05% trained)
-38.48 minutes used for training.
-Peak reserved memory = 10.107 GB.
-Peak reserved memory for training = 4.189 GB.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/67b5c53dd6a178c46d7f3767/mARFkSRyxxliZXLyc36kt.png)

 The data is structured in an Alpaca format, with each training example consisting of Prompt (description of task, JSON schema, and one-shot example), Input (an elite's biographical text), and Output (JSON record).
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 #### Training Hyperparameters
 - **Training regime:** <!-- bf16 non-mixed precision --> bf16 non-mixed precision
 #### Speeds, Sizes, Times [optional]
 <!-- This section provides information about throughput, start/end time, etc. -->
+- Num Epochs = 3 | Total steps = 99
+- Batch size per device = 2 | Gradient accumulation steps = 4
+- Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
+- Trainable parameters = 83,886,080/8,000,000,000 (1.05% trained)
+- 38.48 minutes used for training.
+- Peak reserved memory = 10.107 GB.
+- Peak reserved memory for training = 4.189 GB.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/67b5c53dd6a178c46d7f3767/mARFkSRyxxliZXLyc36kt.png)