This model's origin might be a scam

by ParasiticRogue - opened 29 days ago

29 days ago

Sorry for posting this here bartowski, but I think this is worth discussing for anybody passing by. I don't expect you to do anything about this model, in fact I think you should leave it up for the sake of future reference.

Anyway, the original model got deleted after me and two other users started digging around in the files, with even the posts themselves being deleted beforehand, due to wondering if it was a legit new model from the ground up. The mergekit settings and tokens used suggested otherwise, with my contribution identifying that it had used Mistral instruct in the config, a template that isn't really used outside of Mistral models.

ParasiticRogue

29 days ago

Tag, you're it. Anything you wanna contribute?

@Steelskull

@Delta-Vector (I think you were the second guy in the thread, but correct me if I'm wrong)

Steelskull

29 days ago

they also deleted the reddit post once they were called out

this seems to be an attempt to pass off a mergekit clown car MoE that used 4 mistral models as a new arch and better then all current SOTA models

Delta-Vector

29 days ago

Yep can confirm, if you look at the -E1 checkpoint up on their HF org, you can see a mergekit version in the safetensors. They may have done pretraining on the model which is what you're supposed to do with a clowncar MoE but given the fact they didn't even bother removing the mergekit addition from their config, i find it highly likely it's just a grift.

Mixtral-Arch is also what mergekit outputs regardless of previous arch aswell.

sophosympatheia

29 days ago

Woof. Why some people do these things is beyond me. Thanks for exposing these frauds.

bartowski

Owner 29 days ago

Thank you so much for these reports!

I've gated the model with a link to this discussion, since I'd rather not surprised the info but I do want to try to warn people before they blindly download and waste their bandwidth!!

Real shame.. hope it was just an innocent accident rather than anything malicious!

Lockout

29 days ago

My guess is they made a doubled mixtral moe and ran some 1m dataset through it. A lot of claims and not a whole lot of substance. Absolute blast from the past.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment