--- title: NPC Main Model Inference Server emoji: ๐Ÿค– colorFrom: blue colorTo: pink sdk: gradio sdk_version: "5.44.1" python_version: "3.10" app_file: app.py --- # NPC ๋ฉ”์ธ ๋ชจ๋ธ ์ถ”๋ก  ์„œ๋ฒ„ (hf-serve) ์ด Space๋Š” **NPC ๋Œ€ํ™” ๋ฉ”์ธ ๋ชจ๋ธ**์˜ ์ถ”๋ก  API์™€ ๊ฐ„๋‹จํ•œ Gradio UI๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. Hugging Face Hub์— ์—…๋กœ๋“œ๋œ [Base model](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)๊ณผ [LoRA adapter model](https://huggingface.co/m97j/npc_LoRA-fps)์„ ๋กœ๋“œํ•˜์—ฌ, ํ”Œ๋ ˆ์ด์–ด ๋ฐœํ™”์™€ ๊ฒŒ์ž„ ์ƒํƒœ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ NPC์˜ ์‘๋‹ต, ๊ฐ์ • ๋ณ€ํ™”๋Ÿ‰(delta), ํ”Œ๋ž˜๊ทธ ํ™•๋ฅ /์ž„๊ณ„๊ฐ’์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. --- ## ๐Ÿš€ ์ฃผ์š” ๊ธฐ๋Šฅ - **API ์—”๋“œํฌ์ธํŠธ** `/predict_main` - JSON payload๋กœ prompt๋ฅผ ๋ฐ›์•„ ๋ชจ๋ธ ์ถ”๋ก  ๊ฒฐ๊ณผ ๋ฐ˜ํ™˜ - **์›น UI** `/ui` - NPC ID, ์œ„์น˜, ํ”Œ๋ ˆ์ด์–ด ๋ฐœํ™”๋ฅผ ์ž…๋ ฅํ•ด ์‹ค์‹œ๊ฐ„ ์‘๋‹ต ํ™•์ธ - **์ปค์Šคํ…€ ํ—ค๋“œ ์˜ˆ์ธก** - `delta_head`: trust / relationship ๋ณ€ํ™”๋Ÿ‰ - `flag_head`: ๊ฐ flag์˜ ํ™•๋ฅ  - `flag_threshold_head`: ๊ฐ flag์˜ ์ž„๊ณ„๊ฐ’ - **๋ชจ๋ธ ์‹ค์‹œ๊ฐ„ ์—…๋ฐ์ดํŠธ** - Colab ํ•™์Šต ํ›„ `latest` ๋ธŒ๋žœ์น˜ ์—…๋กœ๋“œ โ†’ `/ping_reload` ํ˜ธ์ถœ ์‹œ ์ฆ‰์‹œ ์žฌ๋กœ๋“œ --- ## ๐Ÿ“‚ ๋””๋ ‰ํ† ๋ฆฌ ๊ตฌ์กฐ ``` hf-serve/ โ”œโ”€ app.py # Gradio UI + API ๋ผ์šฐํŒ… โ”œโ”€ inference.py # ๋ชจ๋ธ ์ถ”๋ก  ๋กœ์ง โ”œโ”€ model_loader.py # ๋ชจ๋ธ/ํ† ํฌ๋‚˜์ด์ € ๋กœ๋“œ โ”œโ”€ utils_prompt.py # prompt ์ƒ์„ฑ ํ•จ์ˆ˜ โ”œโ”€ flags.json # flag index โ†’ name ๋งคํ•‘ โ”œโ”€ requirements.txt # ์˜์กด์„ฑ ํŒจํ‚ค์ง€ โ””โ”€ README.md # (ํ˜„์žฌ ๋ฌธ์„œ) ``` --- ## โš™๏ธ ์ถ”๋ก  ๋กœ์ง ๊ฐœ์š” ์ด ์„œ๋ฒ„์˜ ํ•ต์‹ฌ์€ `run_inference()` ํ•จ์ˆ˜๋กœ, NPC ๋ฉ”์ธ ๋ชจ๋ธ์— ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ž…๋ ฅํ•˜๊ณ  ์‘๋‹ตยท์ƒํƒœ ๋ณ€ํ™”๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ์ „ ๊ณผ์ •์„ ๋‹ด๋‹นํ•ฉ๋‹ˆ๋‹ค. ### ์ฒ˜๋ฆฌ ํ๋ฆ„ 1. **ํ”„๋กฌํ”„ํŠธ ํ† ํฌ๋‚˜์ด์ฆˆ** - ์ž…๋ ฅ๋œ prompt๋ฅผ ํ† ํฌ๋‚˜์ด์ €๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ํ…์„œ ํ˜•ํƒœ๋กœ ์ค€๋น„ - ๊ธธ์ด ์ œํ•œ(`MAX_LENGTH`)๊ณผ ๋””๋ฐ”์ด์Šค(`DEVICE`) ์„ค์ • ์ ์šฉ 2. **์–ธ์–ด๋ชจ๋ธ ์‘๋‹ต ์ƒ์„ฑ** - ์‚ฌ์ „ ์ •์˜๋œ ์ถ”๋ก  ํŒŒ๋ผ๋ฏธํ„ฐ(`GEN_PARAMS`)๋กœ `model.generate()` ์‹คํ–‰ โ†’ NPC์˜ ๋Œ€์‚ฌ ํ…์ŠคํŠธ ์ƒ์„ฑ - ์ƒ์„ฑ๋œ ํ† ํฐ์„ ๋””์ฝ”๋”ฉํ•˜์—ฌ ์ตœ์ข… ๋ฌธ์ž์—ด๋กœ ๋ณ€ํ™˜ 3. **ํžˆ๋“  ์ƒํƒœ ์ถ”์ถœ** - `output_hidden_states=True`๋กœ ๋ชจ๋ธ ์‹คํ–‰ - ๋งˆ์ง€๋ง‰ ๋ ˆ์ด์–ด์˜ hidden state๋ฅผ ๊ฐ€์ ธ์˜ด 4. ** ํ† ํฐ ์œ„์น˜ ํ’€๋ง** - `` ํ† ํฐ์ด ์žˆ๋Š” ์œ„์น˜์˜ hidden state๋ฅผ ํ‰๊ท (pooling) โ†’ NPC ์ƒํƒœ๋ฅผ ๋Œ€ํ‘œํ•˜๋Š” ๋ฒกํ„ฐ๋กœ ์‚ฌ์šฉ - ์—†์„ ๊ฒฝ์šฐ ๋งˆ์ง€๋ง‰ ํ† ํฐ์˜ hidden state ์‚ฌ์šฉ 5. **์ปค์Šคํ…€ ํ—ค๋“œ ์˜ˆ์ธก** - `delta_head`: trust / relationship ๋ณ€ํ™”๋Ÿ‰ ์˜ˆ์ธก - `flag_head`: ๊ฐ flag์˜ ๋ฐœ์ƒ ํ™•๋ฅ  ์˜ˆ์ธก - `flag_threshold_head`: ๊ฐ flag์˜ ์ž„๊ณ„๊ฐ’ ์˜ˆ์ธก 6. **index โ†’ name ๋งคํ•‘** - `flags.json`์˜ ์ˆœ์„œ(`flags_order`)๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์˜ˆ์ธก ๋ฒกํ„ฐ๋ฅผ `{flag_name: ๊ฐ’}` ํ˜•ํƒœ์˜ ๋”•์…”๋„ˆ๋ฆฌ๋กœ ๋ณ€ํ™˜ ### ๋ฐ˜ํ™˜ ํ˜•์‹ ```json { "npc_output_text": "", "deltas": { "trust": 0.xx, "relationship": 0.xx }, "flags_prob": { "flag_name": ํ™•๋ฅ , ... }, "flags_thr": { "flag_name": ์ž„๊ณ„๊ฐ’, ... } } ``` --- ## ๐Ÿ“œ Prompt ํฌ๋งท ๋ชจ๋ธ์€ ํ•™์Šต ์‹œ ์•„๋ž˜์™€ ๊ฐ™์€ ๊ตฌ์กฐ์˜ prompt๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ``` NPC_ID={npc_id} NPC_LOCATION={npc_location} TAGS: quest_stage={quest_stage} relationship={relationship} trust={trust} npc_mood={npc_mood} player_reputation={player_reputation} style={style} LORE: ... DESCRIPTION: ... ... ... ... ``` --- ## ๐Ÿ’ก **์ผ๋ฐ˜์ ์ธ LLM ์ถ”๋ก ๊ณผ์˜ ์ฐจ์ด์ ** ์ด ์„œ๋ฒ„๋Š” ๋‹จ์ˆœํžˆ ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์— ๊ทธ์น˜์ง€ ์•Š๊ณ , `` ํ† ํฐ ๊ธฐ๋ฐ˜ ์ƒํƒœ ๋ฒกํ„ฐ๋ฅผ ์ถ”์ถœํ•˜์—ฌ ์ปค์Šคํ…€ ํ—ค๋“œ์—์„œ **๊ฐ์ • ๋ณ€ํ™”๋Ÿ‰(delta)**๊ณผ **ํ”Œ๋ž˜๊ทธ ํ™•๋ฅ /์ž„๊ณ„๊ฐ’**์„ ๋™์‹œ์— ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋Œ€์‚ฌ ์ƒ์„ฑ๊ณผ ๊ฒŒ์ž„ ์ƒํƒœ ์—…๋ฐ์ดํŠธ๋ฅผ **ํ•œ ๋ฒˆ์˜ ์ถ”๋ก ์œผ๋กœ ์ฒ˜๋ฆฌ**ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. --- ## ๐ŸŽฏ ์ถ”๋ก  ํŒŒ๋ผ๋ฏธํ„ฐ | ํŒŒ๋ผ๋ฏธํ„ฐ | ์˜๋ฏธ | ์˜ํ–ฅ | |----------|------|------| | `temperature` | ์ƒ˜ํ”Œ๋ง ์˜จ๋„ (0.0~1.0+) | ๋‚ฎ์„์ˆ˜๋ก ๊ฒฐ์ •์ (Deterministic), ๋†’์„์ˆ˜๋ก ๋‹ค์–‘์„ฑ ์ฆ๊ฐ€ | | `do_sample` | ์ƒ˜ํ”Œ๋ง ์—ฌ๋ถ€ | `False`๋ฉด Greedy/Beam Search, `True`๋ฉด ํ™•๋ฅ  ๊ธฐ๋ฐ˜ ์ƒ˜ํ”Œ๋ง | | `max_new_tokens` | ์ƒˆ๋กœ ์ƒ์„ฑํ•  ํ† ํฐ ์ˆ˜ ์ œํ•œ | ์‘๋‹ต ๊ธธ์ด ์ œํ•œ | | `top_p` | nucleus sampling ํ™•๋ฅ  ๋ˆ„์  ์ปท์˜คํ”„ | ๋‹ค์–‘์„ฑ ์ œ์–ด (0.9๋ฉด ์ƒ์œ„ 90% ํ™•๋ฅ ๋งŒ ์‚ฌ์šฉ) | | `top_k` | ํ™•๋ฅ  ์ƒ์œ„ k๊ฐœ ํ† ํฐ๋งŒ ์ƒ˜ํ”Œ๋ง | ๋‹ค์–‘์„ฑ ์ œ์–ด (50์ด๋ฉด ์ƒ์œ„ 50๊ฐœ ํ›„๋ณด๋งŒ) | | `repetition_penalty` | ๋ฐ˜๋ณต ์–ต์ œ ๊ณ„์ˆ˜ | 1.0๋ณด๋‹ค ํฌ๋ฉด ๋ฐ˜๋ณต ์ค„์ž„ | | `stop` / `eos_token_id` | ์ƒ์„ฑ ์ค‘๋‹จ ํ† ํฐ | ํŠน์ • ๋ฌธ์ž์—ด/ํ† ํฐ์—์„œ ๋ฉˆ์ถค | | `presence_penalty` / `frequency_penalty` | ํŠน์ • ํ† ํฐ ๋“ฑ์žฅ ๋นˆ๋„ ์ œ์–ด | OpenAI ๊ณ„์—ด์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉ | | `seed` | ๋‚œ์ˆ˜ ์‹œ๋“œ | ์žฌํ˜„์„ฑ ํ™•๋ณด | ์œ„ ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์€ **ํ•™์Šต ์‹œ์—๋Š” ์‚ฌ์šฉ๋˜์ง€ ์•Š๊ณ **, ๋ชจ๋ธ์ด ์‘๋‹ต์„ ์ƒ์„ฑํ•˜๋Š” **์ถ”๋ก  ์‹œ์ **์—๋งŒ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค. ## ๐Ÿ’ก ์‚ฌ์šฉ ์˜ˆ์‹œ - **๊ฒฐ์ •์  ๋ถ„๋ฅ˜/ํŒ์ •์šฉ** (์˜ˆ: `_llm_trigger_check` YES/NO) ```python temperature = 0.0 do_sample = False max_new_tokens = 2 ``` โ†’ ํ•ญ์ƒ ๊ฐ™์€ ์ž…๋ ฅ์— ๊ฐ™์€ ์ถœ๋ ฅ, ์งง๊ณ  ํ™•์ •์ ์ธ ๋‹ต๋ณ€ [ai_server/์˜ local fallback model์— ํŠน์ • ์กฐ๊ฑด์„ ์ง€์‹œํ•  ๋•Œ ์‚ฌ์šฉ] - **์ž์—ฐ์Šค๋Ÿฌ์šด ๋Œ€ํ™”/์ฐฝ์ž‘์šฉ** (์˜ˆ: main/fallback ๋Œ€์‚ฌ ์ƒ์„ฑ) ```python temperature = 0.7 top_p = 0.9 do_sample = True repetition_penalty = 1.05 max_new_tokens = 200 ``` โ†’ ๋‹ค์–‘์„ฑ๊ณผ ์ž์—ฐ์Šค๋Ÿฌ์›€ ํ™•๋ณด [main model ์ถ”๋ก ์‹œ์— ์‚ฌ์šฉ] hf-serve์—์„œ๋Š” ์ž์—ฐ์Šค๋Ÿฌ์šด ๋Œ€ํ™”/์ฐฝ์ž‘์šฉ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์˜ˆ๋ฅผ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. --- ## ๐ŸŒ API & UI ์ฐจ์ด | ๊ฒฝ๋กœ | ์ž…๋ ฅ ํ˜•์‹ | ๋‚ด๋ถ€ ์ฒ˜๋ฆฌ | |------|-----------|-----------| | `/predict_main` | ์™„์„ฑ๋œ prompt ๋ฌธ์ž์—ด | ๊ทธ๋Œ€๋กœ ์ถ”๋ก  | | `/ui` | NPC ID, Location, Utterance | `build_webtest_prompt()`๋กœ prompt ์ƒ์„ฑ ํ›„ ์ถ”๋ก  | --- ## ๐Ÿ“Œ API ์‚ฌ์šฉ ์˜ˆ์‹œ ### ์š”์ฒญ ```json POST /api/predict_main { "session_id": "abc123", "npc_id": "mother_abandoned_factory", "prompt": "...", "max_tokens": 200 } ``` ### ์‘๋‹ต ```json { "session_id": "abc123", "npc_id": "mother_abandoned_factory", "npc_response": "๊ทธ๊ฑด ์ •๋ง ๋†€๋ผ์šด ์ด์•ผ๊ธฐ๊ตฐ์š”.", "deltas": { "trust": 0.42, "relationship": -0.13 }, "flags": { "give_item": 0.87, "end_npc_main_story": 0.02 }, "thresholds": { "give_item": 0.65, "end_npc_main_story": 0.5 } } ``` --- ## ๐Ÿ”„ ๋ชจ๋ธ ์—…๋ฐ์ดํŠธ ํ๋ฆ„ 1. Colab์—์„œ ํ•™์Šต ์™„๋ฃŒ 2. Hugging Face Hub `latest` ๋ธŒ๋žœ์น˜์— ์—…๋กœ๋“œ 3. Colab์—์„œ `/api/ping_reload` ํ˜ธ์ถœ 4. Space๊ฐ€ ์ตœ์‹  ๋ชจ๋ธ ์žฌ๋‹ค์šด๋กœ๋“œ & ๋กœ๋“œ --- ## ๐Ÿ›  ์‹คํ–‰ ๋ฐฉ๋ฒ• ### ๋กœ์ปฌ ์‹คํ–‰ ```bash git clone https://huggingface.co/spaces/m97j/PersonaChatEngine cd PersonaChatEngine pip install -r requirements.txt python app.py ``` ### Hugging Face Space์—์„œ ์‹คํ–‰ - ์›น UI: `https://m97j-PersonaChatEngine.hf.space/ui` - API: `POST https://m97j-PersonaChatEngine.hf.space/api/predict_main` --- ## ๐Ÿ›  ์‹คํ–‰ ํ™˜๊ฒฝ - Python 3.10 - FastAPI, Gradio, Transformers, PEFT, Torch - GPU ์ง€์› ์‹œ ์ถ”๋ก  ์†๋„ ํ–ฅ์ƒ --- ## ๐Ÿ’ก ๋น„์šฉ ์ตœ์ ํ™” ํŒ - Space Settings โ†’ Hardware์—์„œ Free CPU๋กœ ์ „ํ™˜ ์‹œ ๊ณผ๊ธˆ ์—†์Œ - GPU ์‚ฌ์šฉ ์‹œ ํ…Œ์ŠคํŠธ ํ›„ Stop ๋ฒ„ํŠผ์œผ๋กœ Space ์ค‘์ง€ - 48์‹œ๊ฐ„ ์š”์ฒญ ์—†์œผ๋ฉด ์ž๋™ sleep --- ## ๐Ÿ”— ๊ด€๋ จ ๋ฆฌํฌ์ง€ํ† ๋ฆฌ - **์ „์ฒด ํ”„๋กœ์ ํŠธ ๊ฐœ์š” & AI ์„œ๋ฒ„ ์ฝ”๋“œ**: [GitHub - persona-chat-engine](https://github.com/m97j/persona-chat-engine) - **๋ชจ๋ธ ์–ด๋Œ‘ํ„ฐ ํŒŒ์ผ(HF Hub)**: [Hugging Face Model Repo](https://huggingface.co/m97j/npc_LoRA-fps) ---