Verity-1A: Florence-2 + FLODA Deepfake Detection Model

🎯 Model Description

Verity-1A is an advanced multimodal model combining Microsoft's Florence-2-base with the FLODA-deepfake LoRA adapter for enhanced AI-generated content detection. This fusion creates a specialized model optimized for identifying deepfakes and AI-generated images while maintaining Florence-2's powerful vision-language capabilities.

🏗️ Model Architecture

Base Model: Microsoft Florence-2-base (768d architecture)
Enhancement: FLODA-deepfake LoRA adapter
Model Size: ~447 MB
Optimization: PEFT-based fusion for efficient inference

🚀 Key Features

✅ Deepfake Detection: Specialized for AI-generated content identification
✅ Multimodal: Combines vision and language understanding
✅ Compact: 6.7x smaller than Florence-2-large
✅ Production-Ready: Fully validated and optimized

📊 Performance

Architecture: 768-dimensional embeddings
Parameters: ~232M parameters
Inference: Optimized for real-time detection
Compatibility: Full Transformers ecosystem support

🛠️ Usage

from transformers import AutoModelForCausalLM, AutoProcessor
import torch

# Load model
model = AutoModelForCausalLM.from_pretrained(
    "zelus82/verity-1A",
    torch_dtype=torch.float16,
    trust_remote_code=True
)

# Load processor
processor = AutoProcessor.from_pretrained(
    "zelus82/verity-1A",
    trust_remote_code=True
)

# Example usage for deepfake detection
def detect_deepfake(image, text_prompt="Is this image AI-generated?"):
    inputs = processor(text=text_prompt, images=image, return_tensors="pt")
    
    with torch.no_grad():
        generated_ids = model.generate(
            input_ids=inputs["input_ids"],
            pixel_values=inputs["pixel_values"],
            max_new_tokens=1024,
            num_beams=3
        )
    
    generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
    return generated_text

🎓 Training Details

Base Training: Microsoft Florence-2-base foundation
Specialization: FLODA-deepfake LoRA fine-tuning
Fusion Method: PEFT merge_and_unload for optimal performance
Validation: Comprehensive 666-tensor validation passed

📋 Model Card

Attribute	Value
Model Type	Multimodal Vision-Language
Base Architecture	Florence-2
Specialization	Deepfake Detection
Model Size	447 MB
Parameters	~232M
Precision	Float16
License	MIT

🔧 Technical Specifications

Hidden Size: 768
Vocabulary Size: 51,289
Vision Encoder: Advanced transformer-based
Language Model: Optimized for detection tasks
LoRA Rank: 8 (optimal efficiency/performance)

⚠️ Limitations

Optimized specifically for deepfake detection tasks
Based on Florence-2-base architecture (768d)
Not compatible with Florence-2-large components
Requires trust_remote_code=True for full functionality

📄 Citation

@model{verity1a2024,
  title={Verity-1A: Florence-2 Enhanced Deepfake Detection},
  author={zelus82},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/zelus82/verity-1A}
}

🤝 Acknowledgments

Microsoft for the Florence-2 foundation model
FLODA team for the deepfake detection adapter
Hugging Face for the ecosystem and hosting

📞 Contact

For questions or collaborations, please reach out through the Hugging Face community discussions.

Built with ❤️ for safer AI content detection