fev12345 commited on
Commit
85b5765
·
verified ·
1 Parent(s): ad8629c

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - stabilityai/stable-diffusion-3.5-medium
7
+ ---
8
+ # MANGA109 Pose HAの漫画画像で学習したText-Image-to-Image
9
+
10
+ このリポジトリは、[MANGA109 Pose tools](https://github.com/kuri-lab/MANGA109-Pose-tools)の画像生成モデルです。画像生成モデルに入力する条件画像は、上記URLのレポジトリで作成してください。
11
+
12
+
13
+ ## 学習パラメータ
14
+ |引数 | 値 |
15
+ | ---- | ---- |
16
+ |resolution | 512 |
17
+ |train batch size | 4 |
18
+ |learning rate | 1e-05 |
19
+ |mixed precision | fp16 |
20
+ |max train steps | 200,000 |
21
+
22
+ ## 学習データセット
23
+ MANGA109 Pose HA をtraining set,validation set,test set を8:1:1に分割したデータセット
24
+
25
+ ## 作成者の環境
26
+ - GPU:H100NVL(1枚)
27
+ - CUDA:12.4
28
+ - PyTorch:2.6.0+cu124
29
+ - diffusers:0.33.0.dev0
30
+
31
+ ## 計算時間
32
+ H100(NVL)94GB の1 つのGPU を用いて88 時間
33
+ 1 学習ステップあたり1.58 秒
34
+
35
+ ## License
36
+ 本リポジトリは、
37
+ [Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) ](https://creativecommons.org/licenses/by-nc/4.0/deed.en)に基づいてライセンスされています。
38
+
39
+ ## 引用
40
+ このリポジトリを研究で使用する場合は,次の Bibtex エントリを使用して引用することを検討してください.
41
+
42
+ ```
43
+ @article{okada2025manga109pose,
44
+ title={MANGA109 に姿勢情報を追加したデータセットの構築による姿勢を制御した漫画キャラクター画像生成},
45
+ author={岡田 湧路 and 北川 峻 and 渡邉 謙吾 and 稲葉 通将 and 橋本 敦史 and 栗原 聡},
46
+ journal={人工知能学会全国大会論文集},
47
+ volume={JSAI2025},
48
+ pages={2O1GS1005-2O1GS1005}
49
+ year={2025}
50
+ }
51
+ ```
52
+
53
+ ## 更新履歴
54
+ * 2025/04/25: [公開]
55
+ *
README_English.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - stabilityai/stable-diffusion-3.5-medium
7
+ ---
8
+
9
+ # Text-Image-to-Image Trained on MANGA109 Pose HA Manga Images
10
+
11
+ This repository contains an image generation model trained using manga images from [MANGA109 Pose tools](https://github.com/kuri-lab/MANGA109-Pose-tools).
12
+ Please create the conditional input images using the repository linked above.
13
+
14
+ ## Training Parameters
15
+ |Argument | Value |
16
+ | ---- | ---- |
17
+ |resolution | 512 |
18
+ |train batch size | 4 |
19
+ |learning rate | 1e-05 |
20
+ |mixed precision | fp16 |
21
+ |max train steps | 200,000 |
22
+
23
+ ## Training Dataset
24
+ The MANGA109 Pose HA dataset was split into training, validation, and test sets in an 8:1:1 ratio.
25
+
26
+ ## Author's Environment
27
+ - GPU:H100NVL (1 unit)
28
+ - CUDA:12.4
29
+ - PyTorch:2.6.0+cu124
30
+ - diffusers: 0.33.0.dev0
31
+
32
+ ## Computation Time
33
+ Training was conducted on a single H100 NVL GPU (94GB) and took 88 hours.
34
+ Each training step took approximately 1.58 seconds.
35
+
36
+ ## License
37
+ This repository is licensed under the [Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) ](https://creativecommons.org/licenses/by-nc/4.0/deed.en).
38
+
39
+ ## Citation
40
+ If you use this repository in your research, please consider citing it using the following BibTeX entry:
41
+ ```
42
+ @article{okada2025manga109pose,
43
+ title={MANGA109 に姿勢情報を追加したデータセットの構築による姿勢を制御した漫画キャラクター画像生成},
44
+ author={岡田 湧路 and 北川 峻 and 渡邉 謙吾 and 稲葉 通将 and 橋本 敦史 and 栗原 聡},
45
+ journal={人工知能学会全国大会論文集},
46
+ volume={JSAI2025},
47
+ pages={2O1GS1005-2O1GS1005}
48
+ year={2025}
49
+ }
50
+ ```
51
+
52
+ Update History
53
+ * 2025/04/25: [Public Release]
config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "SD3ControlNetModel",
3
+ "_diffusers_version": "0.32.1",
4
+ "_name_or_path": "stabilityai/stable-diffusion-3.5-medium",
5
+ "attention_head_dim": 64,
6
+ "caption_projection_dim": 1536,
7
+ "dual_attention_layers": [
8
+ 0,
9
+ 1,
10
+ 2,
11
+ 3,
12
+ 4,
13
+ 5,
14
+ 6,
15
+ 7,
16
+ 8,
17
+ 9,
18
+ 10,
19
+ 11,
20
+ 12
21
+ ],
22
+ "extra_conditioning_channels": 0,
23
+ "force_zeros_for_pooled_projection": true,
24
+ "in_channels": 16,
25
+ "joint_attention_dim": 4096,
26
+ "num_attention_heads": 24,
27
+ "num_layers": 12,
28
+ "out_channels": 16,
29
+ "patch_size": 2,
30
+ "pooled_projection_dim": 2048,
31
+ "pos_embed_max_size": 384,
32
+ "pos_embed_type": "sincos",
33
+ "qk_norm": "rms_norm",
34
+ "sample_size": 128,
35
+ "use_pos_embed": true
36
+ }
diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:45bee064703ab9be38fb816c2f9fddadb08c2a30920f686a6fc15b8d09c2cc83
3
+ size 5950710432