m97j's picture
Quote python_version in metadata to avoid YAML float parsing
a18d920 verified

A newer version of the Gradio SDK is available: 5.45.0

Upgrade
metadata
title: NPC Main Model Inference Server
emoji: ๐Ÿค–
colorFrom: blue
colorTo: pink
sdk: gradio
sdk_version: 5.44.1
python_version: '3.10'
app_file: app.py

NPC ๋ฉ”์ธ ๋ชจ๋ธ ์ถ”๋ก  ์„œ๋ฒ„ (hf-serve)

์ด Space๋Š” NPC ๋Œ€ํ™” ๋ฉ”์ธ ๋ชจ๋ธ์˜ ์ถ”๋ก  API์™€ ๊ฐ„๋‹จํ•œ Gradio UI๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
Hugging Face Hub์— ์—…๋กœ๋“œ๋œ
Base model๊ณผ LoRA adapter model์„ ๋กœ๋“œํ•˜์—ฌ,
ํ”Œ๋ ˆ์ด์–ด ๋ฐœํ™”์™€ ๊ฒŒ์ž„ ์ƒํƒœ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ NPC์˜ ์‘๋‹ต, ๊ฐ์ • ๋ณ€ํ™”๋Ÿ‰(delta), ํ”Œ๋ž˜๊ทธ ํ™•๋ฅ /์ž„๊ณ„๊ฐ’์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.


๐Ÿš€ ์ฃผ์š” ๊ธฐ๋Šฅ

  • API ์—”๋“œํฌ์ธํŠธ /predict_main
    • JSON payload๋กœ prompt๋ฅผ ๋ฐ›์•„ ๋ชจ๋ธ ์ถ”๋ก  ๊ฒฐ๊ณผ ๋ฐ˜ํ™˜
  • ์›น UI /ui
    • NPC ID, ์œ„์น˜, ํ”Œ๋ ˆ์ด์–ด ๋ฐœํ™”๋ฅผ ์ž…๋ ฅํ•ด ์‹ค์‹œ๊ฐ„ ์‘๋‹ต ํ™•์ธ
  • ์ปค์Šคํ…€ ํ—ค๋“œ ์˜ˆ์ธก
    • delta_head: trust / relationship ๋ณ€ํ™”๋Ÿ‰
    • flag_head: ๊ฐ flag์˜ ํ™•๋ฅ 
    • flag_threshold_head: ๊ฐ flag์˜ ์ž„๊ณ„๊ฐ’
  • ๋ชจ๋ธ ์‹ค์‹œ๊ฐ„ ์—…๋ฐ์ดํŠธ
    • Colab ํ•™์Šต ํ›„ latest ๋ธŒ๋žœ์น˜ ์—…๋กœ๋“œ โ†’ /ping_reload ํ˜ธ์ถœ ์‹œ ์ฆ‰์‹œ ์žฌ๋กœ๋“œ

๐Ÿ“‚ ๋””๋ ‰ํ† ๋ฆฌ ๊ตฌ์กฐ

hf-serve/
 โ”œโ”€ app.py             # Gradio UI + API ๋ผ์šฐํŒ…
 โ”œโ”€ inference.py       # ๋ชจ๋ธ ์ถ”๋ก  ๋กœ์ง
 โ”œโ”€ model_loader.py    # ๋ชจ๋ธ/ํ† ํฌ๋‚˜์ด์ € ๋กœ๋“œ
 โ”œโ”€ utils_prompt.py    # prompt ์ƒ์„ฑ ํ•จ์ˆ˜
 โ”œโ”€ flags.json         # flag index โ†’ name ๋งคํ•‘
 โ”œโ”€ requirements.txt   # ์˜์กด์„ฑ ํŒจํ‚ค์ง€
 โ””โ”€ README.md          # (ํ˜„์žฌ ๋ฌธ์„œ)

โš™๏ธ ์ถ”๋ก  ๋กœ์ง ๊ฐœ์š”

์ด ์„œ๋ฒ„์˜ ํ•ต์‹ฌ์€ run_inference() ํ•จ์ˆ˜๋กœ,
NPC ๋ฉ”์ธ ๋ชจ๋ธ์— ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ž…๋ ฅํ•˜๊ณ  ์‘๋‹ตยท์ƒํƒœ ๋ณ€ํ™”๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ์ „ ๊ณผ์ •์„ ๋‹ด๋‹นํ•ฉ๋‹ˆ๋‹ค.

์ฒ˜๋ฆฌ ํ๋ฆ„

  1. ํ”„๋กฌํ”„ํŠธ ํ† ํฌ๋‚˜์ด์ฆˆ

    • ์ž…๋ ฅ๋œ prompt๋ฅผ ํ† ํฌ๋‚˜์ด์ €๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ํ…์„œ ํ˜•ํƒœ๋กœ ์ค€๋น„
    • ๊ธธ์ด ์ œํ•œ(MAX_LENGTH)๊ณผ ๋””๋ฐ”์ด์Šค(DEVICE) ์„ค์ • ์ ์šฉ
  2. ์–ธ์–ด๋ชจ๋ธ ์‘๋‹ต ์ƒ์„ฑ

    • ์‚ฌ์ „ ์ •์˜๋œ ์ถ”๋ก  ํŒŒ๋ผ๋ฏธํ„ฐ(GEN_PARAMS)๋กœ model.generate() ์‹คํ–‰
      โ†’ NPC์˜ ๋Œ€์‚ฌ ํ…์ŠคํŠธ ์ƒ์„ฑ
    • ์ƒ์„ฑ๋œ ํ† ํฐ์„ ๋””์ฝ”๋”ฉํ•˜์—ฌ ์ตœ์ข… ๋ฌธ์ž์—ด๋กœ ๋ณ€ํ™˜
  3. ํžˆ๋“  ์ƒํƒœ ์ถ”์ถœ

    • output_hidden_states=True๋กœ ๋ชจ๋ธ ์‹คํ–‰
    • ๋งˆ์ง€๋ง‰ ๋ ˆ์ด์–ด์˜ hidden state๋ฅผ ๊ฐ€์ ธ์˜ด
  4. ํ† ํฐ ์œ„์น˜ ํ’€๋ง

    • <STATE> ํ† ํฐ์ด ์žˆ๋Š” ์œ„์น˜์˜ hidden state๋ฅผ ํ‰๊ท (pooling)
      โ†’ NPC ์ƒํƒœ๋ฅผ ๋Œ€ํ‘œํ•˜๋Š” ๋ฒกํ„ฐ๋กœ ์‚ฌ์šฉ
    • ์—†์„ ๊ฒฝ์šฐ ๋งˆ์ง€๋ง‰ ํ† ํฐ์˜ hidden state ์‚ฌ์šฉ
  5. ์ปค์Šคํ…€ ํ—ค๋“œ ์˜ˆ์ธก

    • delta_head: trust / relationship ๋ณ€ํ™”๋Ÿ‰ ์˜ˆ์ธก
    • flag_head: ๊ฐ flag์˜ ๋ฐœ์ƒ ํ™•๋ฅ  ์˜ˆ์ธก
    • flag_threshold_head: ๊ฐ flag์˜ ์ž„๊ณ„๊ฐ’ ์˜ˆ์ธก
  6. index โ†’ name ๋งคํ•‘

    • flags.json์˜ ์ˆœ์„œ(flags_order)๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ
      ์˜ˆ์ธก ๋ฒกํ„ฐ๋ฅผ {flag_name: ๊ฐ’} ํ˜•ํƒœ์˜ ๋”•์…”๋„ˆ๋ฆฌ๋กœ ๋ณ€ํ™˜

๋ฐ˜ํ™˜ ํ˜•์‹

{
  "npc_output_text": "<NPC ์‘๋‹ต>",
  "deltas": { "trust": 0.xx, "relationship": 0.xx },
  "flags_prob": { "flag_name": ํ™•๋ฅ , ... },
  "flags_thr": { "flag_name": ์ž„๊ณ„๊ฐ’, ... }
}

๐Ÿ“œ Prompt ํฌ๋งท

๋ชจ๋ธ์€ ํ•™์Šต ์‹œ ์•„๋ž˜์™€ ๊ฐ™์€ ๊ตฌ์กฐ์˜ prompt๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

<SYS>
NPC_ID={npc_id}
NPC_LOCATION={npc_location}
TAGS:
 quest_stage={quest_stage}
 relationship={relationship}
 trust={trust}
 npc_mood={npc_mood}
 player_reputation={player_reputation}
 style={style}
</SYS>
<RAG>
LORE: ...
DESCRIPTION: ...
</RAG>
<PLAYER_STATE>
...
</PLAYER_STATE>
<CTX>
...
</CTX>
<PLAYER>...
<STATE>
<NPC>

๐Ÿ’ก ์ผ๋ฐ˜์ ์ธ LLM ์ถ”๋ก ๊ณผ์˜ ์ฐจ์ด์ 

์ด ์„œ๋ฒ„๋Š” ๋‹จ์ˆœํžˆ ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์— ๊ทธ์น˜์ง€ ์•Š๊ณ ,
<STATE> ํ† ํฐ ๊ธฐ๋ฐ˜ ์ƒํƒœ ๋ฒกํ„ฐ๋ฅผ ์ถ”์ถœํ•˜์—ฌ ์ปค์Šคํ…€ ํ—ค๋“œ์—์„œ **๊ฐ์ • ๋ณ€ํ™”๋Ÿ‰(delta)**๊ณผ
ํ”Œ๋ž˜๊ทธ ํ™•๋ฅ /์ž„๊ณ„๊ฐ’์„ ๋™์‹œ์— ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
์ด๋ฅผ ํ†ตํ•ด ๋Œ€์‚ฌ ์ƒ์„ฑ๊ณผ ๊ฒŒ์ž„ ์ƒํƒœ ์—…๋ฐ์ดํŠธ๋ฅผ ํ•œ ๋ฒˆ์˜ ์ถ”๋ก ์œผ๋กœ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


๐ŸŽฏ ์ถ”๋ก  ํŒŒ๋ผ๋ฏธํ„ฐ

ํŒŒ๋ผ๋ฏธํ„ฐ ์˜๋ฏธ ์˜ํ–ฅ
temperature ์ƒ˜ํ”Œ๋ง ์˜จ๋„ (0.0~1.0+) ๋‚ฎ์„์ˆ˜๋ก ๊ฒฐ์ •์ (Deterministic), ๋†’์„์ˆ˜๋ก ๋‹ค์–‘์„ฑ ์ฆ๊ฐ€
do_sample ์ƒ˜ํ”Œ๋ง ์—ฌ๋ถ€ False๋ฉด Greedy/Beam Search, True๋ฉด ํ™•๋ฅ  ๊ธฐ๋ฐ˜ ์ƒ˜ํ”Œ๋ง
max_new_tokens ์ƒˆ๋กœ ์ƒ์„ฑํ•  ํ† ํฐ ์ˆ˜ ์ œํ•œ ์‘๋‹ต ๊ธธ์ด ์ œํ•œ
top_p nucleus sampling ํ™•๋ฅ  ๋ˆ„์  ์ปท์˜คํ”„ ๋‹ค์–‘์„ฑ ์ œ์–ด (0.9๋ฉด ์ƒ์œ„ 90% ํ™•๋ฅ ๋งŒ ์‚ฌ์šฉ)
top_k ํ™•๋ฅ  ์ƒ์œ„ k๊ฐœ ํ† ํฐ๋งŒ ์ƒ˜ํ”Œ๋ง ๋‹ค์–‘์„ฑ ์ œ์–ด (50์ด๋ฉด ์ƒ์œ„ 50๊ฐœ ํ›„๋ณด๋งŒ)
repetition_penalty ๋ฐ˜๋ณต ์–ต์ œ ๊ณ„์ˆ˜ 1.0๋ณด๋‹ค ํฌ๋ฉด ๋ฐ˜๋ณต ์ค„์ž„
stop / eos_token_id ์ƒ์„ฑ ์ค‘๋‹จ ํ† ํฐ ํŠน์ • ๋ฌธ์ž์—ด/ํ† ํฐ์—์„œ ๋ฉˆ์ถค
presence_penalty / frequency_penalty ํŠน์ • ํ† ํฐ ๋“ฑ์žฅ ๋นˆ๋„ ์ œ์–ด OpenAI ๊ณ„์—ด์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉ
seed ๋‚œ์ˆ˜ ์‹œ๋“œ ์žฌํ˜„์„ฑ ํ™•๋ณด

์œ„ ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์€ ํ•™์Šต ์‹œ์—๋Š” ์‚ฌ์šฉ๋˜์ง€ ์•Š๊ณ ,
๋ชจ๋ธ์ด ์‘๋‹ต์„ ์ƒ์„ฑํ•˜๋Š” ์ถ”๋ก  ์‹œ์ ์—๋งŒ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค.

๐Ÿ’ก ์‚ฌ์šฉ ์˜ˆ์‹œ

  • ๊ฒฐ์ •์  ๋ถ„๋ฅ˜/ํŒ์ •์šฉ
    (์˜ˆ: _llm_trigger_check YES/NO)

    temperature = 0.0
    do_sample = False
    max_new_tokens = 2
    

    โ†’ ํ•ญ์ƒ ๊ฐ™์€ ์ž…๋ ฅ์— ๊ฐ™์€ ์ถœ๋ ฅ, ์งง๊ณ  ํ™•์ •์ ์ธ ๋‹ต๋ณ€ [ai_server/์˜ local fallback model์— ํŠน์ • ์กฐ๊ฑด์„ ์ง€์‹œํ•  ๋•Œ ์‚ฌ์šฉ]

  • ์ž์—ฐ์Šค๋Ÿฌ์šด ๋Œ€ํ™”/์ฐฝ์ž‘์šฉ
    (์˜ˆ: main/fallback ๋Œ€์‚ฌ ์ƒ์„ฑ)

    temperature = 0.7
    top_p = 0.9
    do_sample = True
    repetition_penalty = 1.05
    max_new_tokens = 200
    

    โ†’ ๋‹ค์–‘์„ฑ๊ณผ ์ž์—ฐ์Šค๋Ÿฌ์›€ ํ™•๋ณด [main model ์ถ”๋ก ์‹œ์— ์‚ฌ์šฉ]

hf-serve์—์„œ๋Š” ์ž์—ฐ์Šค๋Ÿฌ์šด ๋Œ€ํ™”/์ฐฝ์ž‘์šฉ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์˜ˆ๋ฅผ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.


๐ŸŒ API & UI ์ฐจ์ด

๊ฒฝ๋กœ ์ž…๋ ฅ ํ˜•์‹ ๋‚ด๋ถ€ ์ฒ˜๋ฆฌ
/predict_main ์™„์„ฑ๋œ prompt ๋ฌธ์ž์—ด ๊ทธ๋Œ€๋กœ ์ถ”๋ก 
/ui NPC ID, Location, Utterance build_webtest_prompt()๋กœ prompt ์ƒ์„ฑ ํ›„ ์ถ”๋ก 

๐Ÿ“Œ API ์‚ฌ์šฉ ์˜ˆ์‹œ

์š”์ฒญ

POST /api/predict_main
{
  "session_id": "abc123",
  "npc_id": "mother_abandoned_factory",
  "prompt": "<SYS>...<NPC>",
  "max_tokens": 200
}

์‘๋‹ต

{
  "session_id": "abc123",
  "npc_id": "mother_abandoned_factory",
  "npc_response": "๊ทธ๊ฑด ์ •๋ง ๋†€๋ผ์šด ์ด์•ผ๊ธฐ๊ตฐ์š”.",
  "deltas": { "trust": 0.42, "relationship": -0.13 },
  "flags": { "give_item": 0.87, "end_npc_main_story": 0.02 },
  "thresholds": { "give_item": 0.65, "end_npc_main_story": 0.5 }
}

๐Ÿ”„ ๋ชจ๋ธ ์—…๋ฐ์ดํŠธ ํ๋ฆ„

  1. Colab์—์„œ ํ•™์Šต ์™„๋ฃŒ
  2. Hugging Face Hub latest ๋ธŒ๋žœ์น˜์— ์—…๋กœ๋“œ
  3. Colab์—์„œ /api/ping_reload ํ˜ธ์ถœ
  4. Space๊ฐ€ ์ตœ์‹  ๋ชจ๋ธ ์žฌ๋‹ค์šด๋กœ๋“œ & ๋กœ๋“œ

๐Ÿ›  ์‹คํ–‰ ๋ฐฉ๋ฒ•

๋กœ์ปฌ ์‹คํ–‰

git clone https://huggingface.co/spaces/m97j/PersonaChatEngine
cd PersonaChatEngine
pip install -r requirements.txt
python app.py

Hugging Face Space์—์„œ ์‹คํ–‰

  • ์›น UI: https://m97j-PersonaChatEngine.hf.space/ui
  • API: POST https://m97j-PersonaChatEngine.hf.space/api/predict_main

๐Ÿ›  ์‹คํ–‰ ํ™˜๊ฒฝ

  • Python 3.10
  • FastAPI, Gradio, Transformers, PEFT, Torch
  • GPU ์ง€์› ์‹œ ์ถ”๋ก  ์†๋„ ํ–ฅ์ƒ

๐Ÿ’ก ๋น„์šฉ ์ตœ์ ํ™” ํŒ

  • Space Settings โ†’ Hardware์—์„œ Free CPU๋กœ ์ „ํ™˜ ์‹œ ๊ณผ๊ธˆ ์—†์Œ
  • GPU ์‚ฌ์šฉ ์‹œ ํ…Œ์ŠคํŠธ ํ›„ Stop ๋ฒ„ํŠผ์œผ๋กœ Space ์ค‘์ง€
  • 48์‹œ๊ฐ„ ์š”์ฒญ ์—†์œผ๋ฉด ์ž๋™ sleep

๐Ÿ”— ๊ด€๋ จ ๋ฆฌํฌ์ง€ํ† ๋ฆฌ