Setup is a bit tedious, so if you just want to try it out I recommend Colab sample it's fee and only require browser.
セットアップは手間なので、単に試してみたいだけならばブラウザで無料で動かせるColab sampleをお勧めします

It is recommended to use Q6-f16 or higher.
Q6-f16以上を使う事を推奨します

Introduction

In this document briefly explains the following steps:

  1. Run the server for the image generation AI
  2. Run the server for the LLM
  3. Use a python client to send novel data to the server for the LLM to get a prompt for the image generation AI, then send it to the image generation AI server to generate an image

このドキュメントでは以下の流れを簡単に説明します

  1. 画像生成AI用のサーバーを動かす
  2. LLM用のサーバーを動かす
  3. pythonクライアントでLLM用のサーバーに小説データを送信して画像生成AI用プロンプトを得て、画像生成AIサーバーに送信して画像を生成する

1. Run the server for the image generation AI/画像生成AI用のサーバーを動かす

Install lllyasviel/stable-diffusion-webui-forge according to the documentation.
It should install with one click.

lllyasviel/stable-diffusion-webui-forgeをドキュメントに従ってインストールしてください。 ワンクリックでインストールできるはずです

Once the installation is complete, download the model.
インストールが終わったらモデルもダウンロードしてください

Place animagine-xl-4.0-opt.safetensors under stable-diffusion-webui-forge/models/Stable-diffusion/
Place sdxl_vae.safetensors under stable-diffusion-webui-forge/models/VAE/

stable-diffusion-webui-forge/models/Stable-diffusion配下にanimagine-xl-4.0-opt.safetensorsを設置します
stable-diffusion-webui-forge/models/VAE配下にsdxl_vae.safetensorsを設置します

Run the following command to start it. A browser screen may appear, but you can close it since we won't be using it this time.
I think it will work with 16GB of GPU memory. There are models that will work with less memory, so try looking for them.

以下のコマンドで起動します。ブラウザ画面が表示されるかもしれませんが、今回はそれを使わないので閉じて良いです。
GPUメモリが16GBあれば動くと思います。もっとメモリが少なくても動くモデルもあるので探してみてください

python launch.py --api --no-half-vae --medvram --xformers &

2. Run the LLM server/LLM用のサーバーを動かす

If you want to compile it yourself
https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md

If you want to download the binary
https://github.com/ggml-org/llama.cpp/releases

自分でコンパイルする場合
https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md

バイナリをダウンロードする場合
https://github.com/ggml-org/llama.cpp/releases

Start it with the following command
以下のコマンドで起動します

./llama.cpp/build/bin/llama-server -m FanFic-Illustrator-Q6_K-f16.gguf -c 5000

3. Run the python client/3. pythonクライアントを動かす

pip intall transformers
pip install Pillow

sample client_script.py

import transformers
from PIL import Image
import io
import base64
import requests
import json
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("webbigdata/FanFic-Illustrator")

COMMON_QUALITY_PROMPT = ", masterpiece, high score, great score, absurdres"
COMMON_NEGATIVE_PROMPT = "lowres, bad anatomy, bad hands, text, error, missing finger, extra digits, fewer digits, cropped, worst quality, low quality, low score, bad score, average score, signature, watermark, username, blurry"


def generate_image(prompt, negative_prompt, output_path):
    url = "http://localhost:7860/sdapi/v1/txt2img"

    prompt = prompt.replace("(", "\\(").replace(")", "\\)")

    payload = {
        "prompt": prompt,
        "negative_prompt": negative_prompt,
        "steps": 28,
        "cfg_scale": 5,
        "width": 832,
        "height": 1216,
        "sampler_name": "Euler a",
        "sampler_index": "Euler a",
        "override_settings": {
            "sd_model_checkpoint": "animagine-xl-4.0-opt.safetensors",
            "sd_vae": "sdxl_vae.safetensors"
        },
        "enable_hr": False,
        "batch_size": 1,
        "do_not_save_samples": True,
        "seed": -1,
    }

    response = requests.post(url=url, json=payload)
    response.raise_for_status()
    r = response.json()

    image = Image.open(io.BytesIO(base64.b64decode(r['images'][0])))
    image.save(output_path)

    return True


def get_novel_data():
    # Alice in Wonderland / 不思議の国のアリス
    return """Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, "and what is the use of a book," thought Alice "without pictures or conversations?"
So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her.
There was nothing so very remarkable in that; nor did Alice think it so very much out of the way to hear the Rabbit say to itself, "Oh dear! Oh dear! I shall be late!" (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the Rabbit actually took a watch out of its waistcoat-pocket, and looked at it, and then hurried on, Alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge.
In another moment down went Alice after it, never once considering how in the world she was to get out again.
The rabbit-hole went straight on like a tunnel for some way, and then dipped suddenly down, so suddenly that Alice had not a moment to think about stopping herself before she found herself falling down a very deep well."""

if __name__ == "__main__":

    novel_text = get_novel_data()
    category = "general"

    # Alice in Wonderland can be specified because a tag has been registered on Danbooru. Characters without a tag cannot be drawn.
    # If the character is not in the tag, specify "original".
    # 不思議の国のアリスはDanbooruにタグが登録されているため指定可能。タグが登録されていないキャラクターを描く事は出来ない
    # タグにないキャラクターの場合はoriginalを指定する事
    series_name = "alice in wonderland"
    series_description = "A fantasy novel written by Lewis Carroll (Charles Lutwidge Dodgson) in 1865. The story follows a young girl named Alice who falls through a rabbit hole into a fantastical world populated by peculiar anthropomorphic creatures. The tale plays with logic, giving the story lasting popularity with adults as well as children. It is considered one of the best examples of the literary nonsense genre and its narrative, structure, characters, and imagery have been enormously influential in both popular culture and literature."
    character_name = "alice (alice in wonderland)"
    character_description = "A curious, polite, and logical seven-year-old girl with an imagination and a sense of wonder. Alice finds herself in a world that challenges her perceptions of reality and proper behavior. Throughout her journey in Wonderland, she encounters absurd situations and characters that test her patience and understanding. Despite her confusion, Alice maintains her rational thinking and often attempts to apply real-world logic to the nonsensical Wonderland. She is characterized by her curiosity, bravery, and her struggle to maintain her identity in a world where nothing makes sense."

    system_prompt = "あなたは文章の一説を指定ジャンル・キャラクターが登場するシーンに書き換え、そのシーンに合った挿絵を作成するために画像生成AI用プロンプトを作成する優秀なプロンプトエンジニアです"

    user_prompt = f"""### 小説のコンテキストを補足する情報
content category: {category}
series name: {series_name}
series description: {series_description}
character name: {character_name}
character description: {character_description}

### 小説データ
{novel_text}

まず<think>内で以下のように思考を整理します。

<think>
concept: イラストのコンセプトを考えます。小説の内容から主題、設定、雰囲気を理解し、どのようなイラストが最も適切か、全体の構成を考えます
- 人数: 挿絵の中に登場させる人数を考えます。作品に登場する人物の数や重要性を考慮し、メインで描くべき人物やサブキャラクターについても検討してください
- キャラクター名/シリーズ名: 既存作品のキャラクター/シリーズか、オリジナル作品かを考えます。既存作品の場合は、原作の設定や特徴を尊重した表現方法も考慮してください
- ポーズ/構図: ポーズ/構図指定に使うタグを考えます。物語の場面において、キャラクターがどのような体勢/状況にあるのか、どのアングルから描くと効果的かを検討してください
- 背景/環境: 背景/環境指定に使うタグを考えます。物語の舞台設定や時間帯、天候など、雰囲気を表現するために必要な背景要素を詳しく考えてください
- 描画スタイル/テクニック: 描画スタイル/テクニックに使うタグを考えます。物語のジャンルや雰囲気に合わせて、どのような画風や技法が適しているかを検討してください
- 身体的特徴/画面上の物体: 身体的特徴/画面上の物体に関連するタグを考えます。キャラクターの外見的特徴や、シーンに必要な小道具、アイテムなどを詳細に考えてください
</think>

改行の場所も含めて、この順序と書式を厳密に守ってください。
各項目は上記の順序と書式で記述してください。具体的かつ詳細に説明し、十分な長さで考察してください(<think>タグ全体で600-800文字程度が望ましいです)

その後、思考結果に基づき<prompt>内に英単語を18単語ほどカンマで区切って出力してください。キャラクター名/シリーズ名は指定されていたら必ず含めます
日本語は使用しないでください。 最も重要で適切なタグを選び、有効なプロンプトとなるよう考慮してください

### 使用可能な英単語
出力時には以下のタグを優先して使用し、足りない場合は一般的な英単語で補足します
masterpiece, best quality, highresなどの品質に関連するタグは後工程で付与するのでつけてはいけません

**人数/性別**:
- 風景や動物を中心に描画する時: no_human
- 女性の人数: 1girl, 2girls, 3girls, multiple girls
- 男性の人数: 1boy, 2boys, 3boys, multiple boys
- 1girlや1boy指定時にキャラクター中心の構図にするために追加で指定: solo

**ポーズ/構図**:
- 視点: from above, from behind, from below, looking at viewer, straight-on, looking at another, looking back, out of frame, on back, from side, looking to the side, feet out of frame, sideways, three quarter view, looking up, looking down, looking ahead, dutch angle, high up, from outside, pov, vanishing point
- 姿勢/行動: battle, chasing, fighting, leaning, running, sitting, squatting, standing, walking, arm up, arms up, against wall, against tree, holding, spread legs, lying, straddling, flying, holding weapon, clothes lift, hand on own cheek, scar on cheek, hand on another's cheek, kissing cheek, cheek-to-cheek, bandaid on cheek, finger to cheek, hands on another's cheeks, hand on own hip, hand over face, v, kneeling, arabesque (pose), body roll, indian style, standing on one leg, hugging own legs, seiza, nuzzle, unsheathing, holding weapon, holding sword, holding gun, trembling

**背景/環境**:
- 構図/芸術ジャンル: landscape, portrait, still life, group shot, cowboy shot, upper body, full body, detailed face, depth of field, intricate details, cinematic lighting, detailed background, detailed, extremely detailed, perfect composition, detailed face, solo focus, detailed face and body, character focus, intricate, sharp focus, male focus
- 色彩/装飾: greyscale, sepia, blue theme, flat color, high contrast, limited palette, border, cinematic, scenery, rendered, contrast, rich contrast, volumetric lighting, high contrast, glowing
- 背景/風景: checkered background, simple background, indoors, outdoors, jungle, mountain, beach, forest, city, school, cafe, white background, sky
- 時間帯: day, night, twilight, morning, sunset, dawn, dusk
- 天気: sunny, rain, snow, cloud, storm, wind, fogg

**描画スタイル/テクニック**:
- 技法: 3D, oekaki, pixel art, sketch, watercolor, oil painting, digital art, illustration, photorealistic, anime, monochrome, retro color, source anime, cg, realistic
- 表現手法: animalization, personification, science fiction, cyberpunk, steampunk, fantasy, dark novel style, anime style, realistic style, graphic novel style, comic, concept art
- 媒体/伝統的技法: traditional media, marker (medium), watercolor (medium), graphite (medium), official art, sketch, artbook, cover
- 絵柄の年代(指定された時のみ利用): newest, year 1980, year 2000, year 2010, year 1990, year 2020

**身体的特徴/画面上の物体**:
- キャラクター属性/職業/クラス: student, teacher, soldier, knight, wizard, ninja, doctor, artist, musician, athlete, virtual youtuber, chibi, maid, miko
- 表情: angry, blush stickers, drunk, grin, aroused, happy, sad, smile, laugh, crying, surprised, worried, nervous, serious, drunk, blush, aroused, :d, tongue out, sweatdrop, tongue out, :o, tears, tearing up, scared

- 身体的特徴: {{'髪型/髪色': ['long hair', 'short hair', 'twintails', 'ponytail', 'braid', 'bun', 'curly hair', 'straight hair', 'messy hair', 'blonde hair', 'black hair', 'brown hair', 'red hair', 'blue hair', 'green hair', 'white hair', 'purple hair', 'grey hair', 'ahoge', 'sidelocks', 'side ponytail', 'perfect hair', 'tail', 'multicolored hair', 'wavy hair', 'bangs', 'blunt bangs', 'twintails', 'hair between eyes', 'very long hair', 'braid', 'curly hair', 'braided ponytail', 'hand in own hair', 'hair over one eye', 'hair flower', 'two-tone hair', 'streaked hair', 'two side up'], '目の色': ['blue eyes', 'brown eyes', 'green eyes', 'red eyes', 'black eyes', 'purple eyes', 'yellow eyes', 'heterochromia', 'detailed eyes', 'glowing eyes', 'beatiful eyes', 'closed eyes', 'one eye closed'], '身体部位': ['bare shoulders', 'bare arms', 'bare legs', 'barefoot', 'abs', 'flat chest', 'small breasts', 'medium breasts', 'asymmetrical breasts', 'pointy breasts', 'sagging breasts', 'clenched teeth', 'pointy ears', 'perfect anatomy', 'closed mouth', 'long sleeves', 'open mouth', 'pale skin', 'collarbone', 'midriff', 'perfect anatomy', 'bare arms', 'thighs', 'parted lips', 'tongue', 'tanlines', 'dot nose', 'goggles on head', 'armpits', 'nail polish', 'mole', 'feet', 'lips', 'dark-skinned female', 'zettai ryouiki', 'shiny skin'], '身体部位(獣人、擬人化時のみ使用)': ['animal ears', 'cat ears', 'horse ears', 'horse girl', 'fang', 'teeth', 'horns', 'tail'], '服装/装飾品': ['uniform', 'suit', 'dress', 'casual wear', 'formal wear', 'belt', 'detached sleeves', 'swimsuit', 'kimono', 'armor', 'hat', 'glasses', 'white shirt', 'shirt', 'jewelry', 'necklace', 'earrings', 'bracelet', 'watch', 'ribbon', 'hair ribbon', 'scarf', 'gloves', 'boots', 'high heels', 'hair ornament', 'jacket', 'glasses', 'skirt', 'long sleeves', 'short sleeves', 'thighhighs', 'underwear', 'school uniform', 'swimsuit', 'panties', 'hair bow', 'bikini', 'miniskirt', 'fingerless gloves', 'bowtie', 'serafuku', 'japanese clothes', 'choker', 'pants', 'wings', 'open clothes', 'pantyhose', 'pleated skirt', 'frills', 'necktie', 'shorts', 'miko', 'collared shirt', 'leather armor', 'hairband', 'shoes', 'sleeveless', 'alternate costume', 'socks', 'fingering', 'denim shorts', 'epaulettes', 'santa costume', 'ribbon-trimmed sleeves', 'black bowtie', 'gym uniform', 'white bra', 'angel wings', 'crossdressing', 'cuffs', 'halo', 'high heels', 'apron', 'red bow', 'vest', 'open jacket', 'white panties', 'leotard', 'coat', 'black jacket', 'high heels', 'black pantyhose', 'see-through', 'miniskirt', 'elbow gloves', 'wide sleeves', 'white thighhighs', 'fur trim', 'plaid', 'one-piece swimsuit', 'maid headdress', 'ascot', 'high-waist skirt']}}
- 体液: blood, saliva, sweat, tears
- 前景/持ち物/操作物: sword, katana, sheath, gun, book, phone, bag, umbrella, instrument, vehicle, food, drink, guitar, piano, violin, drums, flute, car, bicycle, motorcycle, airplane, ship, flower, weapon, heart, speech bubble, carriage, locomotive, shovel
- 生物: dog, cat, horse, bird, fish, crab, dragon, unicorn, monster, fox, wolf, bear, tiger, lion, dragon, fairy, ghost, zombie, vampire

**性的表現(sensitive, nsfw, explicitのいずれかを指定した時のみ使用可)**:
- 身体部位女性専用: cleavage, backboob, sideboob, underboob, navel, huge breasts, large breasts
- 身体部位男性専用: topless male, necktie between pectorals, loose necktie, bare pectorals, male underwear, fundoshi
- 身体部位共通: open shirt, unbuttoned shirt, seductive smile, bare back, midriff

### 出力
"""
    messages  = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt},
    ]

    prompt = tokenizer.apply_chat_template(
            messages,
            add_generation_prompt=True,
            tokenize=False
    )

    payload = {
            "prompt": prompt,
            "n_predict": 280
    }

    url = "http://localhost:8080/completion"
    headers = {
        "Content-Type": "application/json"
    }

    response = requests.post(url, headers=headers, data=json.dumps(payload))
    if response.status_code != 200:
        print(f"Error: {response.text}")

    model_response = response.json().get('content', '').strip()
    print(f"model_response: {model_response}")

    model_markers = ["assistant\n", "assistant:\n", "assitant\n"]

    for marker in model_markers:
        if marker in model_response:
            model_response = model_response.split(marker)[-1].strip()
            break

    thinking = ""
    prompt_text = ""

    if "<think>" in model_response and "</think>" in model_response:
        thinking = model_response.split("<think>")[1].split("</think>")[0].strip()

    def clean_prompt_text(text):
        # Quality tags will be added later all at once.
        tags_to_remove = [
            "masterpiece", "high score", "great score", "absurdres",
            "highres", "original character", "original series",
            "general", "sensitive", "nsfw", "explicit"
        ]

        words = []
        current_words = text.split(',')

        for word in current_words:
            word = word.strip()
            if not word:
                continue
            if any(tag == word.lower() for tag in tags_to_remove):
                continue
            if word not in words:
                words.append(word)

        return ', '.join(words)

    if "<prompt>" in model_response:
        if "</prompt>" in model_response:
            prompt_text = model_response.split("<prompt>")[1].split("</prompt>")[0].strip()
        else:
            prompt_text = model_response.split("<prompt>")[1].strip()

        prompt_text = clean_prompt_text(prompt_text)
    else:
        prompt_text = f"1girl, {character_name}, {series_name}, anime style, highres"

    prompt_text = prompt_text + f", {category}{COMMON_QUALITY_PROMPT}"

    print(f"prompt for image: {prompt_text}")
    generate_image(prompt_text, COMMON_NEGATIVE_PROMPT, "./test_image.png")

sample output prompt

model_response: <think>
concept: アリスがワホットで落とられて、洞窟の中で呆然と立ち尽くしている場面。驚きと恐怖で表情を浮かべている。
- 人数: 1girl
- キャラクター名/シリーズ名: alice (alice in wonderland), alice in wonderland
- ポーズ/構図: standing, surprised, looking at viewer
- 背景/環境: indoors, cave, dark
- 描画スタイル/テクニック: anime
- 身体的特徴/画面上の物体: long hair, dress, worried, book
</think>
<prompt>1girl, alice (alice in wonderland), alice in wonderland, standing, surprised, looking at viewer, indoors, cave, dark, anime, long hair, dress, worried, book, general</prompt>

alice in wonder

see also webbigdata/FanFic-Illustrator.

Downloads last month
655
GGUF
Model size
3.09B params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for webbigdata/FanFic-Illustrator_gguf

Quantized
(2)
this model