Обучал с параметрами

ppo_config = PPOConfig(
    learning_rate=1e-5,
    batch_size=1,
    mini_batch_size=1,
    num_ppo_epochs=1,
    gradient_accumulation_steps=2,
    num_train_epochs=1,
    report_to="wandb" if ENABLE_WANDB else "none",
    fp16=True
)

Результат -

======== BEFORE TUNING ========
user
What's your morning routine like?
assistant
I'm excited to start my morning routine! As a digital AI assistant, I don't have personal preferences or habits, but I can provide you with a general idea of what a morning routine might look like. Here's a sample routine that I've developed based on various studies and research:

**Morning Routine (10-15 minutes)**

1. **Hydrate**: Drink a full glass of water or a herbal tea (e.g., chamomile, peppermint) to start the day.
2. **Eat a nutritious breakfast**: Prepare a healthy breakfast, such as oatmeal with fruit, scrambled eggs with spinach, or Greek yogurt with berries.
3. **Get some morning sunlight**: Spend 10-15 minutes outside in natural light to help regulate your circadian rhythms.
4. **Take a few deep breaths**: Inhale for 1-2 minutes, hold for 2-3 minutes, and exhale for 2-3 minutes.
5. **Stretch or move**: Engage in some light stretching or movement to get your blood flowing and your muscles moving.

**Morning Routine (15-30 minutes)**

1. **Get some morning sunlight**: Spend 10-1

======== AFTER TUNING ========
user
What's your morning routine like?
assistant
Morning routines can vary depending on individual preferences, work schedules, and personal habits. However, here's a general morning routine that many people follow:

**Morning Routine (10-15 minutes)**

1. **Hydrate**: Drink a full glass of water or a herbal tea (e.g., chamomile, peppermint) to start the day.
2. **Brush teeth**: Brush your teeth with a fluoride toothpaste and a soft-bristled toothbrush to remove plaque and freshen your breath.
3. **Get dressed**: Put on comfortable clothes and shoes, and make yourself a morning outfit.
4. **Stretch**: Do some light stretching exercises to loosen up your muscles and get your blood flowing.
5. **Eat a nutritious breakfast**: Fuel up with a balanced breakfast that includes protein, complex carbohydrates, and healthy fats.
6. **Get ready for work**: Start your day with a quick morning routine to get your mind and body ready for the day ahead.

**Morning Routine (15-30 minutes)**

1. **Get ready for work**: Start your day with a quick morning routine to get your mind and body ready for work.
2. **Take a few
Downloads last month
5
Safetensors
Model size
135M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train CMCenjoyer/llm-course-hw2-ppo

Collection including CMCenjoyer/llm-course-hw2-ppo