aquif-neo-2-345m-c1

This is the first checkpoint of the 'aquif-neo-2-345m' model, a next-generation language model developed by aquif AI. This checkpoint is fine-tuned on a diverse dataset including conversational, code, and math data, serving as the initial step in a 5-checkpoint training process designed to create a versatile and capable model.

Model Details

Base Model: gpt2-medium
Method: LoRA (Low-Rank Adaptation)
Parameter Count: 355 million params\

Training Information

This checkpoint was trained as the first stage of a multi-checkpoint process. The training was performed using a network-resilient script that includes fallback mechanisms for data loading and model initialization.

Checkpoint Number: 1/5
Hardware: Trained on a Google Colab T4 GPU.
Training Duration: Approximately 2.5 hours for this checkpoint.
Training Framework: PyTorch, Hugging Face Transformers, PEFT, bitsandbytes, TRL.
Quantization: 8-bit.\

LoRA Configuration:

r=8
lora_alpha=16
target_modules: ["q_attn", "c_attn", "c_proj", "c_fc", "attn.c_attn", "attn.c_proj", "mlp.c_fc", "mlp.c_proj"]
lora_dropout=0.05
bias="none"
task_type="CAUSAL_LM"
Training Arguments:
per_device_train_batch_size=2
gradient_accumulation_steps=16
num_train_epochs=1 (for this checkpoint)
learning_rate=1e-5
max_steps=400

Optimized for 8-bit training.

Training Loss Data

The following table shows the training loss recorded during the training of this checkpoint:\

Step Training Loss
20 3.4444
40 3.4754
60 3.4954
80 3.4213
100 3.3338
120 3.1749
140 3.2208
160 3.0503
180 2.9293
200 2.8377
220 2.8094
240 2.7225
260 2.6260
280 2.7452
300 2.6614
320 2.5056
340 2.5391
360 2.5115
380 2.4892
400 2.5117

*Note: Training loss is a metric that indicates how well the model is learning. A decreasing loss generally suggests improvement.*\

Intended Use

This checkpoint is an intermediate model in the development of the full 'aquif-neo-2'. It is not intended for production use but serves as a foundation for subsequent fine-tuning checkpoints focusing on specific domains and tasks.

How to Load the Model

You can load this model using the Hugging Face 'transformers' library:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "aquiffoo/aquif-neo-2-345m-c1"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

Future Checkpoints

This is the first of 5 planned checkpoints. Future checkpoints will continue to fine-tune the model on additional data to improve its capabilities across various domains.
License: Apache 2.0

Downloads last month
28
Safetensors
Model size
355M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aquiffoo/aquif-neo-2-345m-c1

Quantizations
1 model

Datasets used to train aquiffoo/aquif-neo-2-345m-c1

Collection including aquiffoo/aquif-neo-2-345m-c1