Tenos v1.25 - A Versatile FLUX.1 Finetune
Welcome to the official model card for the Tenos Finetune, a versatile and aesthetically-focused finetune of Black Forest Labs' groundbreaking FLUX.1 [dev] model.
This model is designed to be the heart of the Tenos Bot, our open-source, self-hosted Discord bot for ComfyUI. Our goal was to create a flexible model that excels at generating high-quality, vibrant images across a wide range of styles, from photorealism to digital art. We hope you enjoy using it as much as we enjoyed creating it!
πΌοΈ Gallery
Here are a few examples of what the Tenos finetune can do.
π‘ Model Details
- Developed by: Tenos.ai
- Shared by: BobsBlazed
- Model type: Finetune
- Base model: black-forest-labs/FLUX.1-dev
- Language(s): English
- License: The license for this model is available in the LICENSE file. It is a derivative work and is subject to the Non-Commercial restrictions of the original FLUX.1 [dev] license.
βοΈ Usage & Recommendations
This model is provided in .safetensors
and .gguf
format and is intended to be used with the UNETLoader and UNETLoader(GGUF) nodes respectively in ComfyUI when using a FLUX workflow.
Steps: 24 - 32
CFG: 1.0 (as required by FLUX models)
Sampler:
euler
ordpmpp_2s_ancestral
Scheduler:
sgm_uniform
Flux Guidance: 2.6 - 3.5 (Note: the model works very well at low guidance! Especially if you're looking for a more edge case style)
-
- Reasoning: The VAE (Variational Autoencoder) is fundamental for handling images within the latent space. The 'Kontext Dev' VAE is part of the FLUX.1 Kontext model suite, an advanced version designed for enhanced image editing and generation, emphasizing precision, character consistency, and robust iterative workflows. While specific technical differences in VAE training aren't publicly detailed, its inclusion in the Kontext suite indicates a refinement tailored to these goals. Based on our testing, this VAE leads to a more accurate and faithful reconstruction of visual details, resulting in more accurate textures, vibrant coloring, and finer details in generated images compared to the base FLUX.1 dev VAE, aligning with the needs of a model focused on high-quality outputs and editing capabilities.
CLIP L: LongCLIP-GmP-ViT-L-14
- Recommendation: For Tenos, which is based on FLUX.1, we strongly recommend using a CLIP model with an extended text context window to leverage the power of detailed prompts. Based on research into available models and their suitability for Flux-based text-to-image generation, the zer0int/LongCLIP-GmP-ViT-L-14 model is the top choice.
- Why LongCLIP-GmP?
- Extended Context (248 Tokens): Unlike standard CLIP models limited to 77 tokens, LongCLIP-GmP can process significantly longer, more detailed text prompts. This is crucial for guiding FLUX to generate images that precisely match complex descriptions.
- Improved Generalization: The Geometric Parametrization (GmP) technique used in this model leads to better training stability and generalization, resulting in a more robust text encoder capable of understanding and guiding the generation of a wider variety of concepts.
- Strong FLUX.1 Compatibility & Support: This model specifically provides documentation and examples for integration with FLUX.1 using the full 248-token context within standard frameworks, simplifying its use in your workflow. While other LongCLIP variants exist, the LongCLIP-GmP model offers the best balance of long context, performance benefits relevant to T2I guidance, and practical integration support for FLUX.1 development.
CLIP T5XXL: flan-t5-xxl-fused FP8
- Reasoning: The CLIP T5XXL model works alongside CLIP L to encode the semantic meaning of the text prompt. The
flan-t5-xxl-fused
model is recommended because it is an optimized version of the standard T5-XXL v1.1 that has been instruction-tuned (as part of the FLAN family). This tuning enhances its ability to understand and interpret nuanced instructions and concepts within the text prompt, leading to more precise and faithful outputs for image generation tasks compared to the base T5-XXL v1.1. The 'fused' aspect simply combines model files for ease of use. - Precision & Performance: We suggest the FP8 e4m3 version as it represents an optimization for efficiency, offering significant VRAM savings with only minimal perceived quality loss compared to FP16 or FP32 versions. This provides an excellent trade-off between quality and inference speed for users with limited VRAM. The FP32 version offers the highest potential quality but requires substantially more powerful hardware.
- Reasoning: The CLIP T5XXL model works alongside CLIP L to encode the semantic meaning of the text prompt. The
ποΈ Training Process
This model is the result of multiple training stages and was merged with custom LoRAs to achieve its unique aesthetic. The training dataset consists of over 8 million images, combining publicly available and ethically sourced repositories with a large set of privately sourced images created and owned by Tenos.ai.
A huge thank you to Lodestone-rock for their incredible work on torchastic. Training this model on a single RTX 4080 would have been impossible without it!
βοΈ Responsible Use & Limitations
This model is subject to the use restrictions outlined in the base model's license. Please ensure you have read and understood the full license before using this model.
Out-of-Scope Use
The model and its derivatives may not be used:
- In any way that violates any applicable national, federal, state, local, or international law or regulation.
- For the purpose of exploiting, harming, or attempting to exploit or harm minors in any way.
- To generate or disseminate verifiably false information and/or content with the purpose of harming others.
- To generate or disseminate personally identifiable information that can be used to harm an individual.
- To harass, abuse, threaten, stalk, or bully individuals or groups of individuals.
- To create non-consensual explicit content or illegal pornographic material.
- For fully automated decision-making that adversely impacts an individual's legal rights.
- For generating or facilitating large-scale disinformation campaigns.
Bias, Risks, and Limitations
- This model is not intended or able to provide factual information.
- As a statistical model, this checkpoint might amplify existing societal biases.
- The model may fail to generate output that perfectly matches the prompt.
- Prompt following is heavily influenced by prompting style and the use of guidance.
Model Card Author: BobsBlazed Contact: [email protected]
- Downloads last month
- 16
16-bit
Model tree for Tenos-ai/Tenos
Base model
black-forest-labs/FLUX.1-dev