metadata
license: apache-2.0
language:
- en
base_model:
- LanguageBind/Open-Sora-Plan-v1.3.0
DATAGRID-Open-Sora-Plan-v1.3.0-0.16M
DATAGRID-Open-Sora-Plan-v1.3.0-0.16M is a Text-to-Video diffusion model based on the Open-Sora-Plan architecture. It has been fine-tuned by DATAGRID Inc. on a custom dataset of 0.16 million royalty-free video clips to generate high-quality videos from text prompts.
Model Details
- Developed by: DATAGRID Inc.
- Model type: Text-to-Video, Inpainting
- Languages: English
- License: Apache 2.0
- Finetuned from model: Open-Sora-Plan-v1.3.0
Model Description
This model extends the capabilities of Open-Sora-Plan by fine-tuning it on a curated, proprietary dataset.
Training Details
- Training Data: Fine-tuned on a custom dataset of 0.16 million royalty-free video-text pairs. This dataset was independently collected and curated by DATAGRID Inc., focusing on diverse scenes, motions, and objects. For V2V inpainting training data preparation, we built an automated mask generation pipeline utilizing state-of-the-art models like Meta AI's SAM2 (Segment Anything Model 2) and Microsoft's Florence2 to automatically generate masks for target objects in videos. This significantly improved efficiency and reduced costs compared to traditional manual annotation methods.
Inference Details
Our fork of Open-Sora-Plan with added mask handling capabilities for inpainting dataset and pipeline is available at DATAGRID-Research-org/Open-Sora-Plan. This fork extends the original model with improved inpainting functionality through enhanced mask processing.
Results
T2V
V2V(Inpainting)
License
This model is released under the Apache 2.0 License.
Citation
Citation information will be provided at a later date.