Feel great!

#3
by DataSoul - opened

I've been using it for a while, and I'm really looking forward to your next release (or the official version).Can I know how long it will take?

CausalLM org

Thank you for your enthusiastic attention. Due to precision issues (likely stemming from the base model), we are currently rolling back progress and retraining with a revised strategy. As we also need to queue for large-scale GPU access at this time, training progress may be relatively slow. However, we will correspondingly include corrections and updates to the dataset in the next update. Please stay tuned.

Thank you for your reply, and I hope everything goes well with you.

CausalLM org

Recalibration on some continual pre-training datasets is currently underway, and here's some progress to show you - we're nearing an inflection point and we can start instruct fine-tuning very soon.

image.png

I've been using this model for a while, it hits the spot in what I do. Can't wait for the next release :)

Recalibration on some continual pre-training datasets is currently underway, and here's some progress to show you - we're nearing an inflection point and we can start instruct fine-tuning very soon.

image.png

Any news on the next model? This one is my favorite 34b model so far, so I'm excited for it.

Any news on the next model? This one is my favorite 34b model so far, so I'm excited for it.

Thank you for your kind attention. In fact, further fine-tuning of Yi-34B has been completed.

image.png

However, I am considering delaying the release of the model due to concerns about potentially misleading accusations arising from benchmark scores. I am currently working on addressing these issues.

Although I have never used these benchmark results as a promotion, they have still attracted some logically unfounded accusations: discussions/2#65df8b93d2d519f6154b48da discussions/5#65f5743fb6941db5c22abcfc

Additionally, regarding the long-context 200K performance of Yi-35B, our evaluation suggests that the pre-trained model's long-context capabilities are undertrained. While the Yi team later released an updated version, we believe it still attempts to improve performance by overfitting the retrieval task rather than adequately training the base model.

Therefore, we have conducted optimization attempts on the 35b model command-r released by Cohere and achieved better performance in long-context evaluations with a more permissive format (which is arguably more challenging for overfitted models).

image.png

However, I regret to inform you that I cannot publicly release these models at this time. I value my reputation and cannot tolerate these unfounded accusations. We will continue to address these issues and will eventually release the model weights, but the timeline remains uncertain.

Any news on the next model? This one is my favorite 34b model so far, so I'm excited for it.

Thank you for your kind attention. In fact, further fine-tuning of Yi-35B has been completed.

image.png

However, I am considering delaying the release of the model due to concerns about potentially misleading accusations arising from benchmark scores. I am currently working on addressing these issues.

Although I have never used these benchmark results as a promotion, they have still attracted some logically unfounded accusations: discussions/2#65df8b93d2d519f6154b48da discussions/5#65f5743fb6941db5c22abcfc

Additionally, regarding the long-context 200K performance of Yi-35B, our evaluation suggests that the pre-trained model's long-context capabilities are undertrained. While the Yi team later released an updated version, we believe it still attempts to improve performance by overfitting the retrieval task rather than adequately training the base model.

Therefore, we have conducted optimization attempts on the 35b model command-r released by Cohere and achieved better performance in long-context evaluations with a more permissive format (which is arguably more challenging for overfitted models).

image.png

However, I regret to inform you that I cannot publicly release these models at this time. I value my reputation and cannot tolerate these unfounded accusations. We will continue to address these issues and will eventually release the model weights, but the timeline remains uncertain.

Im not a huge fan of benchmarks, and honestly when I saw the high mmlu score I assumed this model was only gonna be mediocre at best since most of the top benchmark models I've tried have been kinda underwhelming. I run a separate test of my own to see how well models do for my own use cases and the 34b beta model was surprisingly the best I tested of all the yi finetunes, in fact it was the only 34b yi finetunes that wasnt underwhelming in my test. The only other model that has performed decently in this test is just regular Mixtral instruct, without finetune, and maybe a few 7b Mistral fientunes but these smaller models are usually lacking in other areas. So I really hope to see what you release publicly soon to follow this model.

Also I would not worry about public opinion. No matter what you or anyone else releases, if it is even half decent there will be people who make accusations like this. It's unfortunate but I understand where they're coming from, too many models are being trained with only benchmarks in mind, hurting their overall quality so people are skeptical. It's easy for people to jump to conclusions with how many poor quality models are tuned like this, and how easy it is to get away with cheating the benchmarks, at least to some degree. I don't agree with people jumping to conclusions without actual evidence but I think it's something to accept as inevitable, for anyone who releases any model, rather than something to be concerned or bothered by. It's good that you take care to try and dispel such claims, but one of the best ways to keep a good reputation in my opinion is to not be bothered by such claims. So imo best to try to keep up that clear, transparent and informative communication without letting what a few of those negative people voice affect what you do too much.

Any news on the next model? This one is my favorite 34b model so far, so I'm excited for it.

Thank you for your kind attention. In fact, further fine-tuning of Yi-35B has been completed.

image.png

However, I am considering delaying the release of the model due to concerns about potentially misleading accusations arising from benchmark scores. I am currently working on addressing these issues.

Although I have never used these benchmark results as a promotion, they have still attracted some logically unfounded accusations: discussions/2#65df8b93d2d519f6154b48da discussions/5#65f5743fb6941db5c22abcfc

Additionally, regarding the long-context 200K performance of Yi-35B, our evaluation suggests that the pre-trained model's long-context capabilities are undertrained. While the Yi team later released an updated version, we believe it still attempts to improve performance by overfitting the retrieval task rather than adequately training the base model.

Therefore, we have conducted optimization attempts on the 35b model command-r released by Cohere and achieved better performance in long-context evaluations with a more permissive format (which is arguably more challenging for overfitted models).

image.png

However, I regret to inform you that I cannot publicly release these models at this time. I value my reputation and cannot tolerate these unfounded accusations. We will continue to address these issues and will eventually release the model weights, but the timeline remains uncertain.

Im not a huge fan of benchmarks, and honestly when I saw the high mmlu score I assumed this model was only gonna be mediocre at best since most of the top benchmark models I've tried have been kinda underwhelming. I run a separate test of my own to see how well models do for my own use cases and the 34b beta model was surprisingly the best I tested of all the yi finetunes, in fact it was the only 34b yi finetunes that wasnt underwhelming in my test. The only other model that has performed decently in this test is just regular Mixtral instruct, without finetune, and maybe a few 7b Mistral fientunes but these smaller models are usually lacking in other areas. So I really hope to see what you release publicly soon to follow this model.

Also I would not worry about public opinion. No matter what you or anyone else releases, if it is even half decent there will be people who make accusations like this. It's unfortunate but I understand where they're coming from, too many models are being trained with only benchmarks in mind, hurting their overall quality so people are skeptical. It's easy for people to jump to conclusions with how many poor quality models are tuned like this, and how easy it is to get away with cheating the benchmarks, at least to some degree. I don't agree with people jumping to conclusions without actual evidence but I think it's something to accept as inevitable, for anyone who releases any model, rather than something to be concerned or bothered by. It's good that you take care to try and dispel such claims, but one of the best ways to keep a good reputation in my opinion is to not be bothered by such claims. So imo best to try to keep up that clear, transparent and informative communication without letting what a few of those negative people voice affect what you do too much.

你说得对,虽然我没有做一些在测试集上的精准测试或定量测试,但是就中英双语环境下的使用体验而言,该模型无疑是最优秀的综合性大模型。而我认为,大模型的最大目的就是满足用户的需求,该模型很好的符合了用户的需求,而不是以获得高测试分数为目的。
正确的逻辑应当是:优秀的模型获得高的分数,这是一个优秀的模型,因此具有高的分数。(这是一个标准的假言推理形式,且有充分的理由可以证明)

You're right, although I haven't conducted precise or quantitative tests on the test set, in terms of the user experience in a bilingual Chinese-English environment, this model is undoubtedly the best comprehensive opensource model. In my opinion, the primary purpose of models is to meet user needs, and this model effectively fulfills those needs, rather than aiming solely for high test scores.

The correct logic should be: Excellent models achieve high scores; this is an excellent model; therefore, it achieves high scores. (This is a standard form of hypothetical reasoning, and there are sufficient grounds to support it.)

Any news on the next model? This one is my favorite 34b model so far, so I'm excited for it.

Thank you for your kind attention. In fact, further fine-tuning of Yi-35B has been completed.

image.png

However, I am considering delaying the release of the model due to concerns about potentially misleading accusations arising from benchmark scores. I am currently working on addressing these issues.

Although I have never used these benchmark results as a promotion, they have still attracted some logically unfounded accusations: discussions/2#65df8b93d2d519f6154b48da discussions/5#65f5743fb6941db5c22abcfc

Additionally, regarding the long-context 200K performance of Yi-35B, our evaluation suggests that the pre-trained model's long-context capabilities are undertrained. While the Yi team later released an updated version, we believe it still attempts to improve performance by overfitting the retrieval task rather than adequately training the base model.

Therefore, we have conducted optimization attempts on the 35b model command-r released by Cohere and achieved better performance in long-context evaluations with a more permissive format (which is arguably more challenging for overfitted models).

image.png

However, I regret to inform you that I cannot publicly release these models at this time. I value my reputation and cannot tolerate these unfounded accusations. We will continue to address these issues and will eventually release the model weights, but the timeline remains uncertain.

Im not a huge fan of benchmarks, and honestly when I saw the high mmlu score I assumed this model was only gonna be mediocre at best since most of the top benchmark models I've tried have been kinda underwhelming. I run a separate test of my own to see how well models do for my own use cases and the 34b beta model was surprisingly the best I tested of all the yi finetunes, in fact it was the only 34b yi finetunes that wasnt underwhelming in my test. The only other model that has performed decently in this test is just regular Mixtral instruct, without finetune, and maybe a few 7b Mistral fientunes but these smaller models are usually lacking in other areas. So I really hope to see what you release publicly soon to follow this model.

Also I would not worry about public opinion. No matter what you or anyone else releases, if it is even half decent there will be people who make accusations like this. It's unfortunate but I understand where they're coming from, too many models are being trained with only benchmarks in mind, hurting their overall quality so people are skeptical. It's easy for people to jump to conclusions with how many poor quality models are tuned like this, and how easy it is to get away with cheating the benchmarks, at least to some degree. I don't agree with people jumping to conclusions without actual evidence but I think it's something to accept as inevitable, for anyone who releases any model, rather than something to be concerned or bothered by. It's good that you take care to try and dispel such claims, but one of the best ways to keep a good reputation in my opinion is to not be bothered by such claims. So imo best to try to keep up that clear, transparent and informative communication without letting what a few of those negative people voice affect what you do too much.

It’s great to see you successfully fine-tuning your new model, congratulations! I share your disgust at baseless accusations.

Earlier I noticed elsewhere that someone recommended the model you made (34B-beta). I've actually used it myself and found it to be very useful in solving a variety of problems I've encountered. I subsequently recommended it to others because it really worked.

I trust the effort and professionalism you and your team put into fine-tuning the model. Not only are these accusations unfair, they can also have a negative impact on your motivation. I sincerely hope that you will not be affected by these negative comments and continue to maintain a high standard of work.

I understand the frustration of being blamed, but I believe your model has real value and is truly a fine-tuned improvement toward practical use. I sincerely hope that your model will get better and better and help more people solve real problems. At the same time, I hope that everyone will be able to evaluate your work more fairly in the future.

CausalLM org
edited Apr 1

Due to some underlying controversies that cannot be resolved in the short term, the release of the new model has been indefinitely postponed. Thank you for your company.

JosephusCheung changed discussion status to closed
CausalLM org
CausalLM org

You may also enjoy this 35B model: https://huggingface.co/CausalLM/35b-beta-long

Thanks and admiration for your persistence in model training!

You may also enjoy this 35B model: https://huggingface.co/CausalLM/35b-beta-long

Cool!

You may also enjoy this 35B model: https://huggingface.co/CausalLM/35b-beta-long

Any chance of 32b fine-tune? Qwen 1.5 32b appeals to me most of the new models, since it performs as well as commandr in most tasks, but also uses less vram.

Sign up or log in to comment