Upload folder using huggingface_hub
Browse files- README.md +55 -0
- README_English.md +53 -0
- config.json +36 -0
- diffusion_pytorch_model.safetensors +3 -0
README.md
ADDED
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-nc-4.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
base_model:
|
6 |
+
- stabilityai/stable-diffusion-3.5-medium
|
7 |
+
---
|
8 |
+
# MANGA109 Pose HAの漫画画像で学習したText-Image-to-Image
|
9 |
+
|
10 |
+
このリポジトリは、[MANGA109 Pose tools](https://github.com/kuri-lab/MANGA109-Pose-tools)の画像生成モデルです。画像生成モデルに入力する条件画像は、上記URLのレポジトリで作成してください。
|
11 |
+
|
12 |
+
|
13 |
+
## 学習パラメータ
|
14 |
+
|引数 | 値 |
|
15 |
+
| ---- | ---- |
|
16 |
+
|resolution | 512 |
|
17 |
+
|train batch size | 4 |
|
18 |
+
|learning rate | 1e-05 |
|
19 |
+
|mixed precision | fp16 |
|
20 |
+
|max train steps | 200,000 |
|
21 |
+
|
22 |
+
## 学習データセット
|
23 |
+
MANGA109 Pose HA をtraining set,validation set,test set を8:1:1に分割したデータセット
|
24 |
+
|
25 |
+
## 作成者の環境
|
26 |
+
- GPU:H100NVL(1枚)
|
27 |
+
- CUDA:12.4
|
28 |
+
- PyTorch:2.6.0+cu124
|
29 |
+
- diffusers:0.33.0.dev0
|
30 |
+
|
31 |
+
## 計算時間
|
32 |
+
H100(NVL)94GB の1 つのGPU を用いて88 時間
|
33 |
+
1 学習ステップあたり1.58 秒
|
34 |
+
|
35 |
+
## License
|
36 |
+
本リポジトリは、
|
37 |
+
[Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) ](https://creativecommons.org/licenses/by-nc/4.0/deed.en)に基づいてライセンスされています。
|
38 |
+
|
39 |
+
## 引用
|
40 |
+
このリポジトリを研究で使用する場合は,次の Bibtex エントリを使用して引用することを検討してください.
|
41 |
+
|
42 |
+
```
|
43 |
+
@article{okada2025manga109pose,
|
44 |
+
title={MANGA109 に姿勢情報を追加したデータセットの構築による姿勢を制御した漫画キャラクター画像生成},
|
45 |
+
author={岡田 湧路 and 北川 峻 and 渡邉 謙吾 and 稲葉 通将 and 橋本 敦史 and 栗原 聡},
|
46 |
+
journal={人工知能学会全国大会論文集},
|
47 |
+
volume={JSAI2025},
|
48 |
+
pages={2O1GS1005-2O1GS1005}
|
49 |
+
year={2025}
|
50 |
+
}
|
51 |
+
```
|
52 |
+
|
53 |
+
## 更新履歴
|
54 |
+
* 2025/04/25: [公開]
|
55 |
+
*
|
README_English.md
ADDED
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-nc-4.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
base_model:
|
6 |
+
- stabilityai/stable-diffusion-3.5-medium
|
7 |
+
---
|
8 |
+
|
9 |
+
# Text-Image-to-Image Trained on MANGA109 Pose HA Manga Images
|
10 |
+
|
11 |
+
This repository contains an image generation model trained using manga images from [MANGA109 Pose tools](https://github.com/kuri-lab/MANGA109-Pose-tools).
|
12 |
+
Please create the conditional input images using the repository linked above.
|
13 |
+
|
14 |
+
## Training Parameters
|
15 |
+
|Argument | Value |
|
16 |
+
| ---- | ---- |
|
17 |
+
|resolution | 512 |
|
18 |
+
|train batch size | 4 |
|
19 |
+
|learning rate | 1e-05 |
|
20 |
+
|mixed precision | fp16 |
|
21 |
+
|max train steps | 200,000 |
|
22 |
+
|
23 |
+
## Training Dataset
|
24 |
+
The MANGA109 Pose HA dataset was split into training, validation, and test sets in an 8:1:1 ratio.
|
25 |
+
|
26 |
+
## Author's Environment
|
27 |
+
- GPU:H100NVL (1 unit)
|
28 |
+
- CUDA:12.4
|
29 |
+
- PyTorch:2.6.0+cu124
|
30 |
+
- diffusers: 0.33.0.dev0
|
31 |
+
|
32 |
+
## Computation Time
|
33 |
+
Training was conducted on a single H100 NVL GPU (94GB) and took 88 hours.
|
34 |
+
Each training step took approximately 1.58 seconds.
|
35 |
+
|
36 |
+
## License
|
37 |
+
This repository is licensed under the [Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) ](https://creativecommons.org/licenses/by-nc/4.0/deed.en).
|
38 |
+
|
39 |
+
## Citation
|
40 |
+
If you use this repository in your research, please consider citing it using the following BibTeX entry:
|
41 |
+
```
|
42 |
+
@article{okada2025manga109pose,
|
43 |
+
title={MANGA109 に姿勢情報を追加したデータセットの構築による姿勢を制御した漫画キャラクター画像生成},
|
44 |
+
author={岡田 湧路 and 北川 峻 and 渡邉 謙吾 and 稲葉 通将 and 橋本 敦史 and 栗原 聡},
|
45 |
+
journal={人工知能学会全国大会論文集},
|
46 |
+
volume={JSAI2025},
|
47 |
+
pages={2O1GS1005-2O1GS1005}
|
48 |
+
year={2025}
|
49 |
+
}
|
50 |
+
```
|
51 |
+
|
52 |
+
Update History
|
53 |
+
* 2025/04/25: [Public Release]
|
config.json
ADDED
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_class_name": "SD3ControlNetModel",
|
3 |
+
"_diffusers_version": "0.32.1",
|
4 |
+
"_name_or_path": "stabilityai/stable-diffusion-3.5-medium",
|
5 |
+
"attention_head_dim": 64,
|
6 |
+
"caption_projection_dim": 1536,
|
7 |
+
"dual_attention_layers": [
|
8 |
+
0,
|
9 |
+
1,
|
10 |
+
2,
|
11 |
+
3,
|
12 |
+
4,
|
13 |
+
5,
|
14 |
+
6,
|
15 |
+
7,
|
16 |
+
8,
|
17 |
+
9,
|
18 |
+
10,
|
19 |
+
11,
|
20 |
+
12
|
21 |
+
],
|
22 |
+
"extra_conditioning_channels": 0,
|
23 |
+
"force_zeros_for_pooled_projection": true,
|
24 |
+
"in_channels": 16,
|
25 |
+
"joint_attention_dim": 4096,
|
26 |
+
"num_attention_heads": 24,
|
27 |
+
"num_layers": 12,
|
28 |
+
"out_channels": 16,
|
29 |
+
"patch_size": 2,
|
30 |
+
"pooled_projection_dim": 2048,
|
31 |
+
"pos_embed_max_size": 384,
|
32 |
+
"pos_embed_type": "sincos",
|
33 |
+
"qk_norm": "rms_norm",
|
34 |
+
"sample_size": 128,
|
35 |
+
"use_pos_embed": true
|
36 |
+
}
|
diffusion_pytorch_model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:45bee064703ab9be38fb816c2f9fddadb08c2a30920f686a6fc15b8d09c2cc83
|
3 |
+
size 5950710432
|