Training: Second Phase

#2
by tugstugi - opened

Hi, what is the difference between PowerInfer/LONGCOT-Refine-500K and PowerInfer/QWQ-LONGCOT-500K? Why was PowerInfer/LONGCOT-Refine-500K added in the second phase? PowerInfer/QWQ-LONGCOT-500K was alone not enough?

Let's say if we want to replicate the result with 7B model we need to train with both datasets in a single run?

Greetings

good questions
related mine : is training from cpu only possible ?

I want to know more details about training. Is there any difference between training an inference model and fine-tuning a general model? Or can it be achieved by simply following the steps for fine-tuning a model but using different training datasets?

This comment has been hidden
PowerInfer org

For more challenging questions, QWQ usually tends to use longer chains of thought to answer. For example, in QWQ-LONGCOT-500K, most of the answers exceed 8K. And most of the questions in QWQ-LONGCOT-500K are related to mathematics and code. In order to add other domain and hope to construct some shorter responses, we constructed LONGCOT-Refine-500K and then used these two datasets together for the second stage of SFT.

How was LONGCOT-Refine-500K constructed? First QWQ and then refined with Qwen72 to shorter responses?

PowerInfer org
β€’
edited 10 days ago

The LONGCOT-Refine-500K dataset was constructed using two approaches:

  1. For math and logical reasoning problems, we first used QWQ to generate initial responses, then refined them using Qwen2.5-72B-Instruct.

  2. For open-ended tasks (like report writing etc), we used an example-guided approach - providing a QWQ-generated response(another problem) as a format reference, then having Qwen2.5-72B directly generate new responses following this format.

tugstugi changed discussion status to closed

Sign up or log in to comment