Is the model degrading in censorship?
Version 41-low-step-rl seems almost completely uncensored, while version 47 seems to have the same censorship as a standard flux model.
In my opinion what distinguishes this model from the others and makes it eligible is being uncensored, please don't lose that.
what do you mean censored? they do the same thing. you just have to prompt better. at least explain what you're saying before throwing baseless accusations.
I don't know what you're talking about but there is no censorship in v47. I just used a random pony porn prompt from Civit and it just generated just fine. Check your settings or your prompt. It's clearly something on your side.
I can confirm what @ruleez and @TotalNoob1 wrote: I test each version with the previous one, with given prompts and seeds, mostly NSFW. I saw no kind of censorship, only improvements. So ofc it may depend on your prompt (mine are pretty standard NSFW stuff), but afaik there has been no regression.
I thought it was just me, but noticed that also. However I was using 41 and 46. Perhaps it is because I'm using Flux lora? Ver 41 seems very good and 46 just seemed to get things wrong, like it muddies details like finger tips, faces and toes, as well as other body parts.
No issues with censorship here at all. Misshapen hands can often (though not always) be improved with more steps.
I thought it was just me, but noticed that also. However I was using 41 and 46. Perhaps it is because I'm using Flux lora? Ver 41 seems very good and 46 just seemed to get things wrong, like it muddies details like finger tips, faces and toes, as well as other body parts.
Too Long; Didn't Read
TL;DR: you appear to be describing deformed/poor quality outputs, typically unoptimised parameters(prompt,CFG,etc) are to blame but also Chroma is still in training so expect some quality variation, OP is claiming something entirely different and unfounded: that certain concepts are missing from the model weights due to 'censorship' (alleged: "[...] same censorship as a standard flux model.") Ive been using all the v47 checkpoints and now v48, no major difference at prompt adherence (even when testing sensitive concepts copied from CivitAI/etc...), OP also didn't give the reproducible parameters which makes me skeptical, therefore i believe its safer to assume user error with OP until they show the receipts, even more suspicious is that OP( @FRW43 ) has 0 activity other then this post devoid of any profile info, I would be cautious to any claim they make as a newcomer here as cloak accounts are a common utility to bad faith actors here.
Btw: Chroma is still in training so expect variations in quality for different versions on your image concepts, any issues with quality will be ironed out, 'censorship' is totally different thing.
Response to @3dLemur
That is odd you get quality issue, most issue I have is with photorealism, and certain art movements, when I play with illustrations like manga or cartoon I find that medium easier to prompt for. (I find i can also use less steps and resolution, and iterate faster)
if you suspect something more simple as the cause and you think the prompt, CFG & Steps is optimized, perhaps try using same seed while altering the scheduler/sampler,
I hope you find this helpful π.
@3dLemur You can also DM me and am happy to help out.
I find v48 is better at photorealism (subjectively).
Am on staycation and image-gen is one of my hobbies.
I am no diffusion model expert of course, prompt engineering i find hardest part. (Not enough comprehensive Chroma specific guidance yet that I could find, general guides prove great for most things though)..
Debunking the claim(with what we know about Chroma)
Since they haven't given us the reproducible params (likely for some purpurseful reason) we can't know what caused the issue.
Disregarding the fact training is done in the open and you can see the live training runs and intermediate checkpoints and one wanted to do so.
Firstly, censorship for Text to image doesn't usually occur at the unet level with FLUX. (Post training safety runs overwrite concepts
which inevitably reduce model performance overall, not just prompt adherence,
since its already been pre-trained, trying to omit 'sensitive' concepts from a model is like trying to find a needle in a haystack).
It's more so done by the other components like text encoder/vision models (i.e: CLIP).
Also I assume OP is using FP16 or FP8 for T5 for text encoder, because unlike some unet models; text models(NLP) like T5, are very sensitive to quantization. (Especially when used for CLIP, Of course T5 is encoder-decoder so you have more tight coupling inside the system, I.e: more sensitive to changes, quality loss).
Textual Embeddings/information, and natural language is much more sensitive to imperfections than the Image/vision embeddings of the unet, You change the order of a word and it means something entirely different.
Analogy
Think of Chroma (unet) like the train station that you use to work, and T5(text encoder) is the ticketmaster you use to organize your journey (interface) if you choose the wrong time (unoptimise prompt/params) you can be late but you can improve for next time, but if you mistakingly enter the wrong train (pick a lower quantization precision) you may well be much later to work (spend lots of time tuning parameters where your concept piece happens to dampened by using text-encoder with low quantization precision below FP8).
I realise this is oversimplifying, but I couldn't come up with a better analogy, sorry.
Modified Architecture (not just a finetune)
Chroma is based on FLUX.1 schneel, but is not simply a fine tune as many on HF are, there have been multiple surgical architecture modifications done through ablation experiments and replacing entire sections of the model. (As detailed eloquently in the model card).
Lodestones calls this "re-pre-training" to illustrate this.
For this reason alone, comparing the model to FLUX.1 is like comparing Cabbage & Cauliflower
, yes they are on paper the exact same species of plant, but the appearance, taste & texture can vary a lot as they have been cultivated differently.
This is assuming they mean FLUX.1.schneel by "standard flux model", because dev is not distilled.
OP's Claim: 'censorship'
"Version 41-low-step-rl seems almost completely uncensored, while version 47 seems to have the same censorship as a standard flux model.
In my opinion what distinguishes this model from the others and makes it eligible is being uncensored, please don't lose that." - FRW43 (OP)
The claim is that model avoids certain concepts, which is entirely unfounded as it contradicts my observations with v47+, OP doesn't provide any examples that support their statement of 'censorship' (images & hyper-parameters), which is trivial to do, and many UI's even have a metadata embed option.
One factor is, the model is experimental (still in training), so random errors between checkpoints can also happen, and of course since the model changes frequently, same seed and prompt is almost garenteed to produce wildly different results. (You will Especially notice user error as the model generalises to concepts a given user isn't using here such could be photography/realism, v47 is nearing the high-res tuning stages Lodestones said was ~48-50).
This is just a thought experiment of course (theory).
Until they show receipts, typically it's sampling bias/confirmation bias and user error, any time a checkpoint is made, it changes the statistical probabilities, I.E: the same prompts & seed generate different results.
It doesn't mean the model is any more inferior.
Also the model is heading towards its high-res training stages(Lodestones said ~v48-v50), so user error(poor prompt/hyper parameters) on illustrative media such as manga/cartoon would probably be more pronounced.
This is just theory, if they give us the reproducible hyper parameters, and is a real issue, I am sure Lodestones (and the broader community) would be interested to know.
"Extraordinary claims require extraordinary evidence" - Dr Carl Sagan (Astronomer)
Credibility Red Flag: OP is new account + no public activity other than this
OP (FRW4
I strongly suspect OP to be a Cloaked account, I have the receipts.
Evidence
OP ( @FRW43 ) has no credibility/reputation here, with no public details whatsoever apart from their username and their only public activity being this very post (as of 02/08/2025).
Typically a credible psudoanonymous account has posts & comments over time before they make bold claims.
You can visit their profile here:
https://huggingface.co/FRW43
Strawman Argument Against
Its also technically possible they are a novice user whom recently stumbled here from platforms like CivitAI and this is their first post ever on HF, and its just coincidental they look eerily similar to a throwaway or cloak account.
If this is actually true I will retract this section. (I am keeping tabs so only time will tell )
Conclusion
Its worth raising the lack of any activity or profile info to avoid potential bad actors (even if this one isn't).
HF is a place that has many popular & credible psudoanonymous accounts whom participate in the community productively, I am skeptical that this is one of them. I have followed their account so if that by miracle changes will retract this statement (only time will tell if its genuine or not)
About Me & Disclaimer (for transparency)
My name is James Clarke BSc (CS) (Hons), I am a Software Engineer & amateur ML researcher in my free time. (signature at bottom for personal details)
Disclaimer: I am a user of this model but am NOT a DiT model expert or professional artist, my professional(non-freetime) expertise here is more about deploying local models for business use-cases (Currently Freelance).
Hidden cost to misinformation
Its worth being mindful of the claim which what OP made if taken seriously(which we won't here) have the risk of damaging the model authors reputation, even indirectly through social media and some press platforms (where unlike here in HF, people are far less likely to know or care about the facts & how this technology actually works in reality)
Lodestones has one of the most transparent training pipelines: I.e: you can see the training runs & debug logs live and the intermediate checkpoints & logs between versions are auto-uploaded here https://huggingface.co/lodestones/chroma-debug-development-only )
Right now OP's statements is just a claim with no backing and their lack of account activity (1 post) raises some red flags about credibility.
James David Clarke BSc (Hons, Computer Science, CCCU),
Software Engineer,
Kent,
England
United Kingdom
Update 15:08 BST 02/08/2025:
Improved structural order, wording and cohesion between sections.
Version 41-low-step-rl seems almost completely uncensored, while version 47 seems to have the same censorship as a standard flux model.
In my opinion what distinguishes this model from the others and makes it eligible is being uncensored, please don't lose that.
I personally can't see it, the only thing I noticed is that it is a bit harder now to get deformed limbs compared to the older versions, they're still there, but, again, the model is still in a training phase.
Thank you @Impulse2000 . I was comparing the prompts with the same seed and settings. I work mostly in photorealism. The images were similar/same just fuzzy. Regardless, I am trying some new techniques as well as improving my prompts and getting good results.