File size: 3,505 Bytes
0a61167
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
849c304
0a61167
60d0b18
 
9200501
60d0b18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0a61167
 
 
 
 
 
60d0b18
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
license: apache-2.0
language:
- en
base_model:
- LanguageBind/Open-Sora-Plan-v1.3.0
---
# DATAGRID-Open-Sora-Plan-v1.3.0-0.16M

DATAGRID-Open-Sora-Plan-v1.3.0-0.16M is a Text-to-Video diffusion model based on the Open-Sora-Plan architecture. It has been fine-tuned by DATAGRID Inc. on a custom dataset of 0.16 million royalty-free video clips to generate high-quality videos from text prompts.

## Model Details
- **Developed by:** [DATAGRID Inc.](https://datagrid.co.jp/)
- **Model type:** Text-to-Video, Inpainting
- **Languages**: English
- **License:** Apache 2.0
- **Finetuned from model:** [Open-Sora-Plan-v1.3.0](https://huggingface.co/LanguageBind/Open-Sora-Plan-v1.3.0)

### Model Description

This model extends the capabilities of [Open-Sora-Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan) by fine-tuning it on a curated, proprietary dataset.

## Training Details

- **Training Data**: Fine-tuned on a custom dataset of 0.16 million royalty-free video-text pairs. This dataset was independently collected and curated by DATAGRID Inc., focusing on diverse scenes, motions, and objects. For V2V inpainting training data preparation, we built an automated mask generation pipeline utilizing state-of-the-art models like Meta AI's SAM2 (Segment Anything Model 2) and Microsoft's Florence2 to automatically generate masks for target objects in videos. This significantly improved efficiency and reduced costs compared to traditional manual annotation methods.

## Inference Details

Our fork of Open-Sora-Plan with added mask handling capabilities for inpainting dataset and pipeline is available at [DATAGRID-Research-org/Open-Sora-Plan](https://github.com/DATAGRID-Research-org/Open-Sora-Plan). This fork extends the original model with improved inpainting functionality through enhanced mask processing.

## Results

### T2V
| category | prompt | Open-Sora-Plan-v1.3.0 | DATAGRID-Open-Sora-Plan-v1.3.0-0.16M |
|:-----:|:-----:|:---------------------:|:--------------------------------:|
| dynamic degree| an airplane accelerating to gain speed | ![](./doc/VBench/dynamic_degree/org.gif) | ![](./doc/VBench/dynamic_degree/dg.gif) |
| object class | a bicycle | ![](./doc/VBench/object_class/org.gif) | ![](./doc/VBench/object_class/dg.gif) |
| human action | A person is ice skating | ![](./doc/VBench/human_action/org.gif) | ![](./doc/VBench/human_action/dg.gif) |
| color | A pink bird | ![](./doc/VBench/color/org.gif) | ![](./doc/VBench/color/dg.gif) |
| imaging quality | this is how I do makeup in the morning | ![](./doc/VBench/imaging_quality/org.gif) | ![](./doc/VBench/imaging_quality/dg.gif) |
| spatial relationship | a kite on the top of a skateboard, front view | ![](./doc/VBench/spatial_relationship/org.gif) | ![](./doc/VBench/spatial_relationship/dg.gif) |

### V2V(Inpainting)

| original video | prompt (short) | mask | DATAGRID-Open-Sora-Plan-v1.3.0-0.16M(Inpaint) |
|:-----:|:-----:|:-----:|:--------------------------------:|
| ![](./doc/Inpaint/raw/concreate.gif) | A **juicer** pours **orange juice** into a container. ・・・Morning **juice-making** scene. | ![](./doc/Inpaint/mask/concreate.gif) | ![](./doc/Inpaint/dg/concreate.gif) |
| ![](./doc/Inpaint/raw/fox.gif) | A reddish-brown **cat** resting in a grassy field, ・・・relaxed and content. | ![](./doc/Inpaint/mask/fox.gif) | ![](./doc/Inpaint/dg/fox.gif) |


## License

This model is released under the Apache 2.0 License.

## Citation

Citation information will be provided at a later date.