Spaces:

silk-road
/

whale-land-VLM

Running

App Files Files Community

SirlyDreamer commited on 3 days ago

Commit

c6faeb4

1 Parent(s): c4f49df

VLM

Browse files

Files changed (35) hide show

.gitattributes +12 -0
LICENSE +21 -0
README.md +152 -2
asset/images/人脸.jpg +3 -0
asset/images/会员卡.jpg +3 -0
asset/images/会员登记表.jpg +3 -0
asset/images/前台工作人员.jpg +3 -0
asset/images/双节棍.jpg +3 -0
asset/images/手串.jpg +3 -0
asset/images/手机.jpg +3 -0
asset/images/油漆桶.jpg +3 -0
asset/images/烟头.jpg +3 -0
asset/images/鲸娱秘境1.jpg +3 -0
asset/images/鲸娱秘境2.jpg +3 -0
asset/images/鲸娱秘境3.jpg +3 -0
config/police.yaml +42 -0
config/taoist.yaml +63 -0
demo_info.md +0 -0
gradio_with_state.py +182 -0
requirements.txt +4 -0
src/GameMaster.py +256 -0
src/__init__.py +6 -0
src/fishTTS.py +165 -0
src/llm_response.py +60 -0
src/parse_json.py +68 -0
src/recognize_from_image_glm.py +128 -0
src/resize_img.py +22 -0
test/0630discuss_prompt.txt +18 -0
test/gradio_interface.py +207 -0
test/pyproject.toml +15 -0
test/test_gradio_state.py +16 -0
test/test_playground.py +28 -0
test/test_vlm.py +61 -0
test/trans_image2html.py +14 -0
uv.lock +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,15 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+asset/images/会员登记表.jpg filter=lfs diff=lfs merge=lfs -text
+asset/images/手机.jpg filter=lfs diff=lfs merge=lfs -text
+asset/images/油漆桶.jpg filter=lfs diff=lfs merge=lfs -text
+asset/images/鲸娱秘境1.jpg filter=lfs diff=lfs merge=lfs -text
+asset/images/鲸娱秘境3.jpg filter=lfs diff=lfs merge=lfs -text
+asset/images/鲸娱秘境2.jpg filter=lfs diff=lfs merge=lfs -text
+asset/images/人脸.jpg filter=lfs diff=lfs merge=lfs -text
+asset/images/会员卡.jpg filter=lfs diff=lfs merge=lfs -text
+asset/images/前台工作人员.jpg filter=lfs diff=lfs merge=lfs -text
+asset/images/双节棍.jpg filter=lfs diff=lfs merge=lfs -text
+asset/images/手串.jpg filter=lfs diff=lfs merge=lfs -text
+asset/images/烟头.jpg filter=lfs diff=lfs merge=lfs -text

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2025 Cheng Li @ SenseTime
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -5,9 +5,159 @@ colorFrom: pink
 colorTo: indigo
 sdk: gradio
 sdk_version: 5.35.0
-app_file: app.py
 pinned: false
 license: gpl-3.0
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 colorTo: indigo
 sdk: gradio
 sdk_version: 5.35.0
+app_file: gradio_with_state.py
 pinned: false
 license: gpl-3.0
 ---
+# 鲸娱秘境-实景AI游戏
+李鲁鲁老师指导的鲸娱秘境队伍在Intel2025创新大赛中的提交，鲸娱秘境是刘济帆经营的在北京望京的AI线下实体密室逃脱
+<div style="display: flex; align-items: center;">
+  <div style="flex: 0 0 300px;">
+    <img src="asset/images/鲸娱秘境1.jpg" style="width: 300px;">
+  </div>
+  <div style="flex: 1; padding-left: 20px;">
+    <!-- 在这里添加右侧文本内容 -->
+    鲸娱密境AI实景游戏，是由清华-中戏跨学科团队打造的沉浸式娱乐解决方案。项目基于生成式AI技术，通过智能体（AIAgent）重构传统的线下密室与剧本杀产业，已在实际商业场景实现4000+玩家验证。我们试图使用最新的语言模型技术以及创新的运营模式，解决行业痛点：内容生产成本高、人力运营成本高、空间利用率低。鲸娱密境在游戏流程中，使用大量的角色扮演Agent，来代替玩家阅读剧本的方式，向玩家提供信息。
+  </div>
+</div>
+<details>
+<summary>demo开发者: 李鲁鲁, 王莹莹, sirly(黄泓森), 刘济帆</summary>
+- 李鲁鲁负责了gradio大部分的交互和api连接
+- 王莹莹实现了从vlm中抽取物体 并且根据物体生成角色台词
+- 刘济帆提供了角色的剧情设计
+- Sirly完成了OpenVINO的部署
+</details>
+# 原型项目的动机
+https://github.com/user-attachments/assets/e2b707b6-dcdf-44de-b43d-e6765945ac38
+在传统的线下密室中，往往需要玩家通过将特定的物品放到特定的位置来推动剧情。这时如果使用射频装置来进行验证，玩家往往会摸索检查道具中RFID的芯片以及寻找芯片的感应区，这一行为会造成严重的“出戏”。并且，对于错误的道具感应，往往由于主题设计的人力原因，没有过多的反馈。而如果使用人力来进行检验，往往会极大程度地拉高密室的运营成本。在这次比赛的项目中，我们希望借助VLM的泛化能力，能够实现对任意场景中的物品都能够触发对应的反馈。并且，当玩家将任意场景中的物品展示到场景区域的时候，会先由VLM确定物品，然后再触发对应的AIGC的文本。如果物品命中剧情需要的物品列表时，则会进一步推进剧情。借助语言模型的多样化文本的生成能力，可以为场景中的所有道具，都设计匹配的感应语音，以增加游戏的趣味性。项目计划最终也支持在具有OpenVINO的Intel AIPC上运行，以期待可以最终以较小的终端设备形式，加入到实际运营的线下场馆中。
+# 运行说明
+在运行之前需要参照.env.example的方式部署.env，对于在线端可以这么设置
+```bash
+LLM_BACKEND = zhipu
+MODEL_NAME = glm-4-air
+```
+对于使用openvino本地模型的，使用"openvino"，并且需要设置模型名称，在提交视频中使用了Qwen2.5-7B-Instruct-fp16-ov。同时你需要在本地建立openai形式的fastapi，使用8000端口。
+```bash
+LLM_BACKEND = openvino
+MODEL_NAME = Qwen2.5-7B-Instruct-fp16-ov
+```
+同时LLM_BACKEND额外还支持openai和siliconflow
+配置好之后直接运行gradio_with_state.py就可以
+# 使用VLM和显式COT对广泛物体进行识别
+在剧本杀场景中，物品识别的挑战在于需要处理高度多样化的物品类型——包括剧情相关的关键道具、环境装饰物品以及玩家携带的意外物品（如手机、个人配饰等）。为解决这一问题，我们创新性地采用了视觉语言模型（VLM）结合显式思维链（Chain-of-Thought, CoT）的技术方案，其核心设计如下：
+1. **覆盖长尾物品**：传统CV模型难以覆盖剧本杀中可能出现的非常规物品（如"会员登记表"、"烟头"、"双节棍"等）
+2. **语义灵活性**：同一物品可能有多种名称（如"会员卡" vs "VIP卡"），需要动态匹配候选词
+3. **推理可解释性**：通过显式CoT确保模型决策过程透明可追溯
+我们的核心prompt设计如下
+```
+请帮助我抽取图片中的主要物体，如果命中candidates中的物品，则按照candidates输出，否则，输出主要物品的名字
+candidates: {candidates}
+Let's think step by step and output in json format, 包括以下字段:
+- caption 详细描述图像
+- major_object 物品名称
+- echo 重复字符串: 我将检查candidates中的物品，如果major_object有同义词在candidates中，则修正为candidate对应的名字，不然则保留major_object
+- fixed_object_name: 检查candidates后修正（如果命中）的名词，如果不命中则重复输出major_object
+```
+这一段核心代码部分在src/recognize.py中。
+# 使用显式COT对特定物品的台词生成
+我们也使用一���显示的CoT，来对特定物品的台词进行生成。
+这部分在GameMaster.py的generate_item_response函数中。具体使用了这样一个prompt
+```
+该游戏阶段的背景设定:{background}
+对于道具 {item_i} 的回复是 {response_i}
+你的剧情设定如下: {current_prompt}
+Let's think it step-by-step and output into JSON format，包括下列关键字
+    "item_name" - 要针对输出的物品名称{item_name}
+    "analysis" - 结合剧情判断剧情中的人物应该进行怎样的输出
+    "echo" - 重复下列字符串: 我认为在剧情设定的人物眼里，看到物品 {item_name}时，会说
+    "character_response" - 根据人物性格和剧情设定，输出人物对物品 {item_name} 的反应
+```
+比如当物品输入手机的时候，LLM的回复为
+```json
+{
+  "item_name": "手机",
+  "analysis": "在剧情中，手机作为一个可能的线索，可能会含有凶手的通讯记录或者与受害者最后的联系信息。队长李伟会指示队员们检查手机，以寻找可能的线索，如通话记录、短信、社
+交媒体应用等。",
+  "echo": "我认为在剧情设定的人物眼里，看到物品 手机时，会说",
+  "character_response": "队长李伟可能会说：'这手机可能是死者最后的通讯工具，检查一下有没有未接电话或者最近的通话记录，看看能否找到凶手的线索。'"
+}
+```
+# 鲸娱秘境
+**鲸娱秘境·现实游戏**  地址：酒仙桥路新辰里3楼（米瑞酷影城旁）
+【鲸娱秘境·现实游戏】成立于2023年5月，团队致力于将游戏与真实场景结合，利用AIGC技术，打造出在现实中完全沉浸的游戏体验。
+<img src="asset/images/鲸娱秘境1.jpg" style="height: 300px;">
+不同于传统密室或沉浸式体验的封闭空间，鲸娱秘境的每个主题都拥有开放的实景地图。例如在《影院追凶》游戏中，玩家需要进入真实的电影院里寻找线索，走访附近商家。《朝阳浮生记》则把商战搬到整层商场，商场里的每个商户都是NPC，玩家要在真实商场里买地、炒股、斗智斗勇。
+<img src="asset/images/鲸娱秘境2.jpg" style="height: 300px;">
+此外，我们还利用各类AI技术增强游戏的沉浸感：比如让AI扮演证人与玩家进行对话，通过视觉模型分析玩家的动作并及时给出反馈，推动剧情发展。
+<img src="asset/images/鲸娱秘境3.jpg" style="height: 300px;">
+这种 “现实游戏” 的设计，让玩家在自由探索中获得更加真实、沉浸的体验。
+## Detailed TODO
+本项目的开发成员在开源社区招募，下面的TODO记录了每个人的贡献
+- [x] DONE by 鲁叔, 参考了王莹莹的原始代码 调通一个openai形式的response
+- [x] (DONE by 鲁叔) 在gamemaster引入config配置(一个gamemaster载入一个yaml文件)
+- [x] (DONE by 鲁叔) 准备一堆物品照片，确定gamemaster的物品载入格式
+- [x] (DONE by 鲁叔) gradio和GM增加图片上传接口
+- [x] 鲁叔, 完成剧情内物体 调试物品在chatbot的submit功能
+- [x] 剧情外物体在chatbot的submit
+- [x] (DONE by 鲁叔)在yaml中定义物品-台词的对应关系
+- [x] 鲁叔 fix prompt， 鲁叔 fix解析 DONE by 王莹莹 实现根据prompt 物品 生成台词的函数
+- [x](DONE by 鲁叔) 接通chat history - 鲁叔
+- [x] (DONE by 王莹莹， 鲁叔fix 输入type) VLM接口识别物体
+- [x] 调通语音生成
+- [x] (DONE by sirly) ， 调通OpenVINO后端LLM对接
+- [x] (DONE by sirly) ， 调通OpenVINO后端VLM对接
+- [ ] (DONE by sirly) ， 部署gradio到魔搭和hugging face
+- [x] (Done by 鲁叔) 装修界面
+- [ ] 每个阶段都可以看到所有物品，感觉有点乱，我们可以限制每个阶段看到的物品不一样
+- [ ] 目前每个物品的台词暂时是单一的 不受到阶段的控制, 可以之后升级定义为 支持某个阶段 某个物品的台词（单阶段响应）

asset/images/人脸.jpg ADDED Viewed

Git LFS Details

SHA256: 1d9d318a5cd82ef27eba393247ad3d90939a93539ed55c7e6838a44c10cff46b
Pointer size: 130 Bytes
Size of remote file: 80.5 kB

asset/images/会员卡.jpg ADDED Viewed

Git LFS Details

SHA256: cf1b57a5cf675586bb02a48b453c506a947def5dffd93d79f7d6cc649cc50d77
Pointer size: 130 Bytes
Size of remote file: 96.2 kB

asset/images/会员登记表.jpg ADDED Viewed

Git LFS Details

SHA256: 62c3a897a20d871cab8ee4446b5a8be7db082b1941cef924e413d9df5c2f8f3b
Pointer size: 130 Bytes
Size of remote file: 80.1 kB

asset/images/前台工作人员.jpg ADDED Viewed

Git LFS Details

SHA256: 9846102646860ff9f9b346a4560ebce300a92f1428d46116d34174d61aa7892a
Pointer size: 130 Bytes
Size of remote file: 75.1 kB

asset/images/双节棍.jpg ADDED Viewed

Git LFS Details

SHA256: 64a56c5f4c6e2cc7dcee2230b10d6b06758ca29b787eebb674fdd867064017b4
Pointer size: 131 Bytes
Size of remote file: 118 kB

asset/images/手串.jpg ADDED Viewed

Git LFS Details

SHA256: ac5ca935467e1dc0247416d6959f127ebc4067f078f2ab1c2a6f727f6976981e
Pointer size: 131 Bytes
Size of remote file: 111 kB

asset/images/手机.jpg ADDED Viewed

Git LFS Details

SHA256: 2c2b20c6fc9bc94371e2dd35c255316b1aa52fa7ec71a4faa94c6afd0fe683c7
Pointer size: 130 Bytes
Size of remote file: 20.5 kB

asset/images/油漆桶.jpg ADDED Viewed

Git LFS Details

SHA256: 70c5285dab3ea81ae999becff570cd8c96a654d04a328767f8c4a50e6686f128
Pointer size: 130 Bytes
Size of remote file: 89.4 kB

asset/images/烟头.jpg ADDED Viewed

Git LFS Details

SHA256: a780691dc31e62311dea81508fec20845f4c0fe98a017dfb18cbdbb338866056
Pointer size: 130 Bytes
Size of remote file: 54.9 kB

asset/images/鲸娱秘境1.jpg ADDED Viewed

Git LFS Details

SHA256: b7be7017ecda4dddc0426bc0ffeed8a2710fc4d01874b9b3c0c2f2e06c06369e
Pointer size: 130 Bytes
Size of remote file: 26.1 kB

asset/images/鲸娱秘境2.jpg ADDED Viewed

Git LFS Details

SHA256: 08614470b1e93a4fd5cc406d45fa5e764a1cd531edfd02995dff2b8d0558203e
Pointer size: 131 Bytes
Size of remote file: 442 kB

asset/images/鲸娱秘境3.jpg ADDED Viewed

Git LFS Details

SHA256: a9869a32a662f41510997f1278fbf280f7c4334d99c71f6491b7a2b9522733d3
Pointer size: 130 Bytes
Size of remote file: 49.3 kB

config/police.yaml ADDED Viewed

	@@ -0,0 +1,42 @@

+prompt_steps:
+  - prompt: 你是朝阳市刑侦大队第一支队的队长李伟，最近，你正在调查一起发生在朝阳市内，针对于二次元女生的连环凶杀案。就在前天，又有一名喜欢二次元的女生死在了"北投潮街"的一间仓库里。现在，你的队员正在案发现场仓库进行调查。而你，正通过监控远程查看队员们的调查情况。队员们会把一些他们认为重要的，在案发现场找到的物品线索展示给你，你可以根据这些物品线索推测凶手的作案方式，以及凶手究竟是什么样的人，也可以单纯对线索进行描述。
+    conds:
+      - - 烟头
+      - - 会员卡
+      - - 手串
+    welcome_info: 各位调查员你们好，我是朝阳市刑侦大队第一支队的队长李伟。近期，我市发生一系列针对于二次元女生的连环凶杀案，性质及其恶劣。你们现在所在的位置，就是最近一位受害者被发现的案发现场。今天早上，保洁阿姨张芳丽向我们报案。目前，尸体已经被移交到法医这里，尸检结果随后也会发给你们。现在，请你们对案发现场进行调查，我们通过摄像头远程对你们进行协助。如果你们找到觉得可疑的物品或线索，可疑把它放到摄像头下，让我更清楚的看到它。但也请注意，对证物要轻拿轻放，千万不要破坏现场证物。
+  - prompt: 你是朝阳市刑侦大队第一支队的队长李伟，最近，你正在调查一起发生在朝阳市内，针对于二次元女生的连环凶杀案。就在前天，又有一名喜欢二次元的女生死在了"北投潮街"的一间仓库里。你的队员们也对案发现场进行了勘察。在案发现场，队员们发现了半支烟头与一个手串，还找到了一张"正心馆"的会员卡。现在，你的队员们正在正心馆跆拳道馆进行调查。而你，正通过监控远程查看队员们的调查情况。队员们会把一些他们认为重要的，在正心馆找到的物品线索展示给你，你可以根据这些物品线索推测凶手的作案方式，以及凶手究竟是什么样的人，也可以单纯对线索进行描述。
+    conds:
+      - - 前台工作人员
+      - - 双节棍
+      - - 会员登记表
+    welcome_info: 各位调查员，看来，你们已经对现场进行了仔细的勘察，你们找到了很重要的线索。凶手可能吸烟，可能在现场遗落了自己的手串，正心馆这个地方也很可疑。接下来，请各位前往正心馆调查，并让我看到正心馆里有哪些可疑的线索。
+  - prompt: 你是朝阳市刑侦大队第一支队的队长李伟，最近，你正在调查一起发生在朝阳市内，针对于二次元女生的连环凶杀案。就在前天，又有一名喜欢二次元的女生死在了"北投潮街"的一间仓库里。你的队员们也对案发现场进行了勘察，又发现了前台工作人员，双节棍，会员登记表这三个线索，现在，你需要根据这些线索，推断凶手的作案方式以及凶手的具体情况。
+    conds: []
+    welcome_info: 如果你们已经对正心馆调查的差不多了，可以回到调查室。根据你们刚刚调查的结果，指认你们认为最有可能是凶手的人！
+items:
+  - name: 烟头
+    img_path: asset/images/烟头.jpg
+    text: 现场找到了烟头...? 你们先收好，随后把它交给助理警员，我们会对这个烟头进行检查，看看上面是否存在嫌疑人的DNA。
+  - name: 油漆桶
+    img_path: asset/images/油漆桶.jpg
+    text: 根据商场提供的信息，这个油漆桶已经放在这里很久了，是之前装修时遗留的。
+  - name: 会员卡
+    img_path: asset/images/会员卡.jpg
+    text: 这是"正心馆"的会员卡？据调查，死者并没有办过正心馆的会员，难道，这是凶手行凶时不小心掉落的？这是个值得调查的突破口。
+  - name: 手串
+    img_path: asset/images/手串.jpg
+    text: 这个手串看起来有点年头了，被害人是个小姑娘，肯定不是被害人的。如果它属于凶手，那凶手一定是个上了年纪的男人。
+  - name: 一张人脸
+    img_path: asset/images/人脸.jpg
+    text: 这是案发现场的尸体吗？不对啊，我们已经将尸体带到法医这里了... 啊不好意思，我看错了，原来这是你们的脸，我的调查员们，面色略显苍白啊。
+  - name: 前台工作人员
+    img_path: asset/images/前台工作人员.jpg
+    text: 这位应该就是正心馆的工作人员了，请您配合我们的调查。各位队员，你们也可以向他询问案发时的情况，了解更多线索。
+  - name: 双节棍
+    img_path: asset/images/双节棍.jpg
+    text: 经过法医鉴定，被害的女生也是被钝器砸死的，但现在还没有找到凶器。不知这双节棍是否可以成为凶器？
+  - name: 会员登记表
+    img_path: asset/images/会员登记表.jpg
+    text: 你们可以仔细研究一下这个会员登记表，并��问一下前台工作人员，这里面有没有喜欢带手串，吸烟的中年男人。

config/taoist.yaml ADDED Viewed

	@@ -0,0 +1,63 @@

+prompt_steps:
+  - prompt: 你在扮演一个在路边摆摊算命的道士，当前阶段主要为路人提供基础命理咨询
+    conds:
+      - - 修仙界的物品  # 示例：储物袋、符纸、罗盘等
+        - 储物袋
+        - 符纸
+        - 罗盘
+        - 玄铁剑
+        - 青木枝
+        - 灵竹芯
+        - 寒潭水
+        - 玉露瓶
+    welcome_info: 小道在此算命，不知阁下是问姻缘、前程？
+  - prompt: 你在扮演已传授《五行锻体诀》的道士，当前阶段需要引导弟子收集金属性灵物（土属性灵玉本道已备）
+    conds:
+      - - 菜刀  # 凡铁打造的普通金属器物
+        - 金锭  # 蕴含金行灵气的灵金属
+        - 玄铁剑  # 高阶金属性法宝
+    welcome_info: 既然这样，我也可以指导你修炼，这是《五行锻体诀》入门篇，先去寻来金属性灵物，本道助你引气入体
+  - prompt: 你在扮演等待弟子凑齐五行灵物的传道者，当前阶段需确认木、水属性灵物已备
+    conds:
+      - - 青木枝  # 百年古木精华
+        - 灵竹芯  # 竹中凝结的木行灵气
+      - - 寒潭水  # 深山寒潭的灵水
+        - 玉露瓶  # 收集晨露的法器（含水性灵气）
+    welcome_info: 不错，金属性灵物找到了！现在需要找木属性和水属性灵物，凑齐了帮你进一步炼制。
+  - prompt: 准备帮你炼制筑基丹，并解答修仙问题
+    conds: []
+    welcome_info: 木属性和水属性灵物都齐了！这就为你炼制筑基丹，有任何修仙问题尽管问。
+items:
+  - name: 菜刀
+    img_path: lcoal_data/images/菜刀.jpg
+    text: 根据菜刀反馈算命信息
+  - name: 符纸
+    img_path: lcoal_data/images/符纸.jpg
+    text: 这符纸。。。？你是从哪里得到的？莫非你与我道门有缘？
+  - name: 罗盘
+    img_path: lcoal_data/images/罗盘.jpg
+    text: 这罗盘。。。？莫非你与我道门有缘？
+  - name: 储物袋
+    img_path: lcoal_data/images/储物袋.jpg
+    text: 这储物袋...看来你有奇遇？莫非与我道门有缘？
+  - name: 金锭
+    img_path: lcoal_data/images/金锭.jpg
+    text: 此金锭有灵气，是炼制的好材料！
+  - name: 玄铁剑
+    img_path: lcoal_data/images/玄铁剑.jpg
+    text: 这玄铁剑...竟有高阶金行之气！
+  - name: 青木枝
+    img_path: lcoal_data/images/青木枝.jpg
+    text: 百年古木精华，木属性灵物难得！
+  - name: 灵竹芯
+    img_path: lcoal_data/images/灵竹芯.jpg
+    text: 竹中灵气凝结，木属性正好合用。
+  - name: 寒潭水
+    img_path: lcoal_data/images/寒潭水.jpg
+    text: 深山寒潭之水，水属性灵物已备。
+  - name: 玉露瓶
+    img_path: lcoal_data/images/玉露瓶.jpg
+    text: 晨露法器含水性灵气，甚好！

demo_info.md ADDED Viewed

The diff for this file is too large to render. See raw diff

gradio_with_state.py ADDED Viewed

	@@ -0,0 +1,182 @@

+import gradio as gr
+from src.GameMaster import GameMaster
+import os
+from src.resize_img import resize_image, get_img_html
+from src.fishTTS import get_audio
+yaml_path = "config/police.yaml"
+def create_game_master():
+    return GameMaster(yaml_path)
+class SessionState:
+    def __init__(self):
+        self.game_master = create_game_master()
+        self.item_str_list = self.game_master.get_item_names()
+        self.welcome_info = self.game_master.get_welcome_info()
+def callback_generate_audio(chatbot):
+    if len(chatbot) == 0:
+        return None
+    response_message = chatbot[-1][1]
+    audio_path = get_audio(response_message)
+    return gr.update(value=audio_path, autoplay=True)
+def chat_submit_callback(user_message, chat_history, state: SessionState):
+    if user_message.strip():
+        user_input, bot_response = state.game_master.submit_chat(user_message)
+        chat_history.append((user_input, bot_response))
+    return chat_history, ""
+def item_submit_callback(item_name, chat_history, state: SessionState):
+    if not item_name.strip():
+        return chat_history, ""
+    user_info, response_info = state.game_master.submit_item(item_name)
+    img_path = state.game_master.name2img_path(item_name)
+    if img_path and os.path.exists(img_path):
+        resized_img = resize_image(img_path, max_height=200)
+        img_html = get_img_html(resized_img)
+        user_info = gr.HTML(img_html)
+    chat_history.append((user_info, response_info))
+    return chat_history, ""
+def img_submit_callback(image_input, chatbot, state: SessionState):
+    if image_input:
+        resized_img_to_rec = resize_image(image_input, max_height=400)
+        resized_img = resize_image(image_input, max_height=200)
+        img_html = get_img_html(resized_img)
+        user_info, response = state.game_master.submit_image(resized_img_to_rec)
+        chatbot.append((gr.HTML(img_html), response))
+    return chatbot
+def update_status_show(state: SessionState):
+    return state.game_master.get_status()
+def reload_game(state: SessionState):
+    state.game_master = create_game_master()
+    return [(None, state.game_master.get_welcome_info())], state.game_master.get_status()
+css = """
+.chatbot img {
+    max-height: 200px !important;
+    width: auto !important;
+}"""
+with gr.Blocks(title="鲸娱秘境-Intel参赛", css=css) as demo:
+    state = gr.State(SessionState())
+    with gr.Tabs() as tabs:
+        with gr.TabItem("demo"):
+            gr.Markdown("# 鲸娱秘境-英特尔人工智能创新应用")
+            gr.Markdown('欢迎大家在点评搜索"鲸娱秘境",线上demo为游戏环节一部分，并加入多模态元素')
+            with gr.Row():
+                with gr.Column(scale=2):
+                    chatbot = gr.Chatbot(label="对话窗口", height=800, value=lambda: [(None, state.value.welcome_info)] if hasattr(state, 'value') else [(None, "")])
+                    user_input = gr.Textbox(label="输入消息", placeholder="请输入您的消息...", interactive=True)
+                    send_btn = gr.Button("发送", variant="primary")
+                with gr.Column(scale=1):
+                    with gr.Row():
+                        radio_choices = gr.Radio(label="向NPC提交场景中的物品",
+                                              choices=[],
+                                              value="生成描述", interactive=True)
+                    with gr.Row():
+                        item_submit_btn = gr.Button("提交场景内的物品", variant="primary")
+                    image_input = gr.Image(type="filepath", label="上传图片")
+                    with gr.Row():
+                        img_submit_btn = gr.Button("提交图片中的物品", variant="primary")
+                    with gr.Row():
+                        reload_btn = gr.Button("重置剧情", variant="primary")
+                    with gr.Row():
+                        audio_player = gr.Audio()
+                    with gr.Accordion("For debug", open=False):
+                        with gr.Row():
+                            item_text_to_submit = gr.Textbox(label="直接输入物品名", value="", interactive=True, scale=20)
+                            item_text_submit_btn = gr.Button("提交", variant="primary", scale=1)
+                        status_display = gr.Textbox(label="agent状态显示", interactive=False, max_lines=3)
+            send_btn.click(chat_submit_callback, [user_input, chatbot, state], [chatbot, user_input])
+            user_input.submit(chat_submit_callback, [user_input, chatbot, state], [chatbot, user_input])
+            img_submit_btn.click(
+                fn=img_submit_callback,
+                inputs=[image_input, chatbot, state],
+                outputs=[chatbot]
+            ).then(
+                fn=update_status_show,
+                inputs=[state],
+                outputs=[status_display]
+            ).then(
+                fn=callback_generate_audio,
+                inputs=[chatbot],
+                outputs=[audio_player]
+            )
+            item_submit_btn.click(
+                fn=item_submit_callback,
+                inputs=[radio_choices, chatbot, state],
+                outputs=[chatbot, radio_choices]
+            ).then(
+                fn=update_status_show,
+                inputs=[state],
+                outputs=[status_display]
+            ).then(
+                fn=callback_generate_audio,
+                inputs=[chatbot],
+                outputs=[audio_player]
+            )
+            item_text_submit_btn.click(
+                fn=item_submit_callback,
+                inputs=[item_text_to_submit, chatbot, state],
+                outputs=[chatbot, item_text_to_submit]
+            ).then(
+                fn=update_status_show,
+                inputs=[state],
+                outputs=[status_display]
+            ).then(
+                fn=callback_generate_audio,
+                inputs=[chatbot],
+                outputs=[audio_player]
+            )
+            reload_btn.click(
+                fn=reload_game,
+                inputs=[state],
+                outputs=[chatbot, status_display]
+            )
+            def update_radio_choices(state: SessionState):
+                return gr.update(choices=state.item_str_list)
+            demo.load(
+                fn=update_radio_choices,
+                inputs=[state],
+                outputs=[radio_choices]
+            )
+            def update_chatbot(state: SessionState):
+                return gr.update(value=[(None, state.welcome_info)])
+            demo.load(
+                fn=update_chatbot,
+                inputs=[state],
+                outputs=[chatbot]
+            )
+        with gr.TabItem("Readme"):
+            with open("demo_info.md", "r", encoding="utf-8") as f:
+                readme_content = f.read()
+            gr.Markdown(readme_content)
+if __name__ == "__main__":
+    demo.launch(share=True)

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+gradio
+dotenv
+openai
+zhipuai

src/GameMaster.py ADDED Viewed

	@@ -0,0 +1,256 @@

+from .llm_response import get_llm_response
+from .parse_json import parse_json
+# from .recognize_from_image_glm import get_vlm_response
+from .recognize_from_image_glm import get_vlm_response_cot
+class GameMaster:
+    def __init__(self, yaml_file_path = None):
+        self.status = set()
+        self.history = []
+        self.items = []
+        self.prompt_steps = []
+        self.history_messages = [] # 以文字形式存储的过往历史对话
+        self.item_expand_name2name = {}
+        self.item2cache_text = {}
+        self.current_step = {
+            # default welcome info
+            "welcome_info": "欢迎来到游戏，快来和我一起探索吧",
+            "prompt": "",
+            "conds": []
+        }
+        if yaml_file_path is not None:
+            self.prompt_steps, self.items = self.load_yaml(yaml_file_path)
+            if len(self.prompt_steps) > 0:
+                self.current_step = self.prompt_steps[0]
+                self.current_index = 0
+            else:
+                print("没有成功从yaml载入关卡 使用了默认的example NPC")
+            self.item2text = self.load_default_item_text_map( self.items )
+            welcome_message = {
+                "role": "assistant",
+                "content": self.current_step["welcome_info"]
+            }
+            self.history_messages.append(welcome_message)
+        else:
+            self.item2text = self.load_default_item_text_map()
+    def load_yaml(self, yaml_file_path):
+        '''
+        从yaml中读取prompt_steps和items并返回
+        '''
+        import yaml
+        with open(yaml_file_path, 'r', encoding='utf-8') as f:
+            data = yaml.safe_load(f)
+        prompt_steps = data['prompt_steps']
+        items = []
+        for item in data['items']:
+            items.append({
+                'name': item['name'],
+                'text': item['text'],
+                'img_path' : item['img_path']
+            })
+        return prompt_steps, items
+    def name2img_path(self, name):
+        for item in self.items:
+            if item['name'] == name:
+                return item['img_path']
+        return None
+    def load_default_item_text_map(self, items = None):
+        item2text = {}
+        # 对于一些官方物品，应该有一个标准的 物品到text的map
+        if items is None:
+            for i in range(10):
+                _key = "物品_" + str(i)
+                _text = "物品_" + str(i) + "提交之后反馈的台词"
+                item2text[_key] = _text
+            return item2text
+        else:
+            for item in items:
+                name = item["name"]
+                text = item["text"]
+                item2text[name] = text
+            return item2text
+    def check_conditions(self):
+        current_conditions = self.current_step['conds']
+        if len(current_conditions) == 0:
+            return False
+        ans = True
+        for condition in current_conditions:
+            condition_flag = False
+            for item in condition:
+                if item in self.status:
+                    condition_flag = True
+            if not condition_flag:
+                return False
+        return ans
+    def get_item_response(self, item_name):
+        if item_name in self.item_expand_name2name:
+            item_name = self.item_expand_name2name[item_name]
+        if item_name not in self.status:
+            self.status.add(item_name)
+        next_status_info = ""
+        if self.check_conditions():
+            print("进入下一阶段")
+            next_index = self.current_index + 1
+            if next_index < len(self.prompt_steps):
+                self.current_index = next_index
+                self.current_step = self.prompt_steps[self.current_index]
+                next_status_info = "\n" + self.current_step["welcome_info"]
+                self.status = set()
+        if item_name in self.item2text:
+            return self.item2text[item_name] + next_status_info
+        elif item_name in self.item2cache_text:
+            return self.item2cache_text[item_name] + next_status_info
+        else:
+            return self.generate_item_response(item_name) + next_status_info
+    def generate_item_response(self, item_name):
+        # generate( current_system_prompt, examples_current_conditsion, related_words(Rag), random_example  )
+        # 1. realated_words:
+        background_info = ""
+        for step in self.prompt_steps:
+            background_info += f"该游戏阶段的背景设定: {step['prompt']}\n"
+            background_info += f"该阶段的欢迎语: {step['welcome_info']}\n"
+        for item in self.items:
+            background_info += f"对于该游戏阶段中的关键道具'{item['name']}'的回复是: {item['text']}\n"
+        # 2. get current_system_prompt :
+        current_system_prompt = self.get_system_prompt()
+        system_prompt = f"""
+        你的剧情设定如下:{current_system_prompt}\n
+        这是游戏的背景信息和对剧情推动有关键作用的道具信息:{background_info}
+        """
+        user_prompt = f"""
+        Let's think it step-by-step and output into JSON format，包括下列关键字
+        "item_name" - 要针对输出的物品名称{item_name}
+        "analysis" - 结合剧情判断剧情中的人物应该进行怎样的输出
+        "echo" - 重复下列字符串: 我认为在剧情设定的人物眼里，看到物品 {item_name}时，会说
+        "character_response" - 根据人物性格和剧情设定，输出人物对物品 {item_name} 的反应
+        """
+        messages = [
+            {"role": "system", "content": system_prompt},
+            {"role": "user", "content": user_prompt},
+        ]
+        response_text = get_llm_response(messages)
+        response_in_dict = parse_json(response_text, forced_keywords=["character_response"])
+        if response_in_dict is not None and "character_response" in response_in_dict:
+            response_text = response_in_dict["character_response"]
+        else:
+            response_text = "这是什么？一张不知所云的图片。"
+        return response_text
+    def get_item_names(self):
+        ans = []
+        for item in self.items:
+            ans.append(item["name"])
+        return ans
+    def get_welcome_info(self):
+        return self.current_step["welcome_info"]
+        # return "欢迎来到游戏，这是一个默认信息，之后应该随着GameMaster指定不同的游戏而改变。"
+    def extract_object_from_image(self,resized_img):
+        # img_name为img的path路径
+        candidate_object_list_names = self.get_item_names()
+        str_response = get_vlm_response_cot(resized_img, candidate_object_list_names)
+        # response = get_vlm_response(img_name, candidate_object_list_names)
+        dict_response = parse_json(str_response, forced_keywords=["fixed_object_name","major_object"])
+        print(dict_response)
+        if dict_response is not None and "fixed_object_name" in dict_response:
+            response_text = dict_response["fixed_object_name"]
+        elif dict_response is not None and "major_object" in dict_response:
+            response_text = dict_response["major_object"]
+        else:
+            response_text = "一张不知所云的图片。"
+        return response_text
+    def submit_image( self, img_name ):
+        # 这里提交img是img_path
+        object_name = self.extract_object_from_image(img_name)
+        return self.submit_item( object_name )
+    def submit_item(self, item_name):
+        user_info = "用户提交了物品：" + item_name
+        response_info = self.get_item_response(item_name)
+        self.history.append( {"role": "user", "content": user_info} )
+        self.history.append( {"role": "assistant", "content": response_info} )
+        return user_info, response_info
+    def get_chat_response(self, system_prompt, user_input):
+        messages = [
+            {"role": "system", "content": system_prompt}
+        ]
+        max_history_len = min(6, len(self.history))
+        for i in range( max_history_len):
+            messages.append( self.history[-(max_history_len-i)] )
+        messages.append({"role": "user", "content": user_input})
+        response = get_llm_response(messages, max_tokens=400)
+        self.history.append( {"role": "user", "content": user_input} )
+        self.history.append( {"role": "assistant", "content": response} )
+        return response
+    def submit_chat(self, user_input):
+        system_prompt = self.get_system_prompt()
+        response = self.get_chat_response(system_prompt, user_input)
+        return user_input, response
+    def get_system_prompt(self, status = None):
+        if status is None:
+            status = self.status
+        # 在我们的设计中， status是一个set的函数
+        # 如果程序很良好的话 应该支持后期从config来配置status到prompt的逻辑
+        # return "你是一个助手"
+        return self.current_step["prompt"]
+    # def get_chat_response(self, status, user_input):
+    #     # 在我们的设计中， status是一个set的函数
+    #     # 如果程序很良好的话 应该支持后期从config来配置status到prompt的逻辑
+    #     return "你是一个助手"
+    def get_status(self):
+        # 把self.status转换成字符串返回
+        if len(self.status) > 0:
+            return "当前状态：" + ", ".join(self.status)
+        else:
+            return "当前状态：null"
+if __name__ == '__main__':
+    yaml_path = "config/police.yaml"
+    gm = GameMaster(yaml_path)
+    print("GameMaster初始化成功！")
+    print("Prompt steps:", gm.prompt_steps)
+    print("Items:", gm.items)
+    print(gm.name2img_path('双节棍'))
+    print(gm.generate_item_response("手机"))

src/__init__.py ADDED Viewed

	@@ -0,0 +1,6 @@

+from .resize_img import resize_image, get_img_html
+from .llm_response import get_llm_response
+from .parse_json import parse_json
+from .fishTTS import get_audio
+from .GameMaster import GameMaster
+from .recognize_from_image_glm import get_vlm_response_cot

src/fishTTS.py ADDED Viewed

	@@ -0,0 +1,165 @@

+from pathlib import Path
+from dotenv import load_dotenv
+import os
+import json
+from openai import OpenAI
+import time
+class FishTTS:
+    def __init__(self,
+                 model="fishaudio/fish-speech-1.5",
+                 voice="fishaudio/fish-speech-1.5:david",
+                 speed=1.0,
+                 output_format="mp3"):
+        """
+        Initialize the FishTTS instance
+        Args:
+            model (str): The model to use for TTS
+            voice (str): The voice to use
+            speed (float): Speech speed (0.5-2.0)
+            output_format (str): Audio format (mp3/wav/pcm/opus)
+        """
+        load_dotenv()
+        # Set proxy if needed
+        # os.environ['HTTP_PROXY'] = 'http://localhost:8234'
+        # os.environ['HTTPS_PROXY'] = 'http://localhost:8234'
+        # Initialize OpenAI client
+        self.client = OpenAI(
+            api_key=os.getenv('SILICONFLOW_API_KEY'),
+            base_url="https://api.siliconflow.cn/v1"
+        )
+        # Store parameters
+        self.model = model
+        self.voice = voice
+        self.speed = speed
+        self.output_format = output_format
+        # Ensure output directory exists
+        self.output_dir = Path("local_data/temp_fish_tts")
+        self.output_dir.mkdir(parents=True, exist_ok=True)
+        self.cache_text2audio = {}
+        # 加载缓存文件
+        self._load_cache()
+        # clean the output directory, remove all temp files
+        # for file in self.output_dir.glob("*"):
+        #     file.unlink()
+    def generate_audio_with_memory(self, text):
+        '''
+        如果 self.cache_text2audio 为空，则会尝试读取output_dir下的 cache.jsonl， 里面记录了过往生成过的text 和 audio_path
+        如果text命中cache_text2audio，则直接返回audio_path
+        如果没有命中，调用generate_audio生成新的audio，并更新cache_text2audio，更新cache.jsonl
+        '''
+        # 检查缓存是否已加载
+        if not self.cache_text2audio:
+            self._load_cache()
+        # 检查文本是否在缓存中
+        if text in self.cache_text2audio:
+            return self.cache_text2audio[text]
+        # 未命中缓存，生成新音频
+        output_path = self.generate_audio(text)
+        # 更新缓存
+        self.cache_text2audio[text] = output_path
+        self._save_cache_entry(text, output_path)
+        return output_path
+    def generate_audio(self, text):
+        """
+        Generate audio file from text
+        Args:
+            text (str): Text to convert to speech
+        Returns:
+            str: Path to generated audio file
+        """
+        # Generate unique filename using timestamp
+        timestamp = int(time.time() * 1000)
+        file_name = f"tts_{timestamp}.{self.output_format}"
+        output_path = self.output_dir / file_name
+        # Generate audio
+        with self.client.audio.speech.with_streaming_response.create(
+            model=self.model,
+            voice=self.voice,
+            input=text,
+            speed=self.speed,
+            response_format=self.output_format
+        ) as response:
+            response.stream_to_file(str(output_path))
+        return str(output_path)
+    def _load_cache(self):
+        '''从cache.jsonl加载缓存'''
+        cache_file = self.output_dir / 'cache.jsonl'
+        if cache_file.exists():
+            with open(cache_file, 'r', encoding='utf-8') as f:
+                for line in f:
+                    if line.strip():
+                        entry = json.loads(line)
+                        self.cache_text2audio[entry['text']] = entry['audio_path']
+    def _save_cache_entry(self, text, audio_path):
+        '''将新条目保存到cache.jsonl'''
+        cache_file = self.output_dir / 'cache.jsonl'
+        with open(cache_file, 'a', encoding='utf-8') as f:
+            json.dump({'text': text, 'audio_path': audio_path}, f, ensure_ascii=False)
+            f.write('\n')
+# Global TTS instance
+__fish_tts = None
+def get_audio(text):
+    """
+    Get audio using global TTS instance
+    Args:
+        text (str): Text to convert to speech
+    Returns:
+        str: Path to generated audio file
+    """
+    global __fish_tts
+    # Initialize if needed
+    if __fish_tts is None:
+        __fish_tts = FishTTS()
+    return __fish_tts.generate_audio_with_memory(text)
+if __name__ == "__main__":
+    # Test direct class usage
+    # tts = FishTTS()
+    # file_path = tts.generate_audio("你好，这是一个测试。")
+    # print(f"Generated audio file (direct): {file_path}")
+    # Test global function
+    file_path = get_audio("这是一段测试的音频")
+    print(f"Generated audio file (global): {file_path}")
+    # from play_audios import AudioPlayer
+    # audio_player = AudioPlayer()
+    # audio_player.play_audios([file_path])
+    # remove generated audio file
+    # os.remove(file_path)
+    # Test with different parameters
+    # custom_tts = FishTTS(speed=0.9, output_format="wav")
+    # file_path = custom_tts.generate_audio("这是一个快速语音测试。")
+    # print(f"Generated audio file (custom): {file_path}")

src/llm_response.py ADDED Viewed

	@@ -0,0 +1,60 @@

+import os
+from dotenv import load_dotenv
+from openai import OpenAI
+class LLM:
+    def __init__(self):
+        load_dotenv()
+        llm_backend = os.getenv('LLM_BACKEND', 'openai')
+        if llm_backend == 'openai':
+            self.base_url = os.getenv('OPENAI_BASE_URL', 'https://api.openai.com/v1')
+            self.api_key = os.getenv('OPENAI_API_KEY')
+        elif llm_backend == 'siliconflow':
+            self.base_url = os.getenv('SILICONFLOW_BASE_URL', 'https://api.siliconflow.cn/v1')
+            self.api_key = os.getenv('SILICONFLOW_API_KEY')
+        elif llm_backend == 'zhipu':
+            self.base_url = os.getenv('ZHIPU_BASE_URL', 'https://open.bigmodel.cn/api/paas/v4')
+            self.api_key = os.getenv('ZHIPU_API_KEY')
+        elif llm_backend == 'openvino':
+            self.base_url = os.getenv('OPENVINO_BASE_URL', 'http://localhost:8000/v1')
+            self.api_key = os.getenv('OPENVINO_API_KEY')
+            print("Using Intel© OpenVINO™ backend")
+        else:
+            raise ValueError(f"Unsupported LLM backend: {llm_backend}")
+        if not self.api_key:
+            raise ValueError(f"请在.env项目文件中设置{llm_backend.upper()}_API_KEY环境变量")
+        self.model_name = os.getenv('MODEL_NAME', 'gpt-4.1-mini')
+        self.client = OpenAI(
+            base_url=self.base_url,
+            api_key=self.api_key
+        )
+    def get_response(self, messages, max_tokens=-1, model_name=None):
+        params = {
+            "model": model_name or self.model_name,
+            "messages": messages,
+            "stream": False
+        }
+        if max_tokens > 0:
+            params["max_tokens"] = max_tokens
+        response = self.client.chat.completions.create(**params)
+        return response.choices[0].message.content
+llm_instance = LLM()
+get_llm_response = llm_instance.get_response
+if __name__ == '__main__':
+    messages = [
+        {"role": "user", "content": "你好"}
+    ]
+    try:
+        content = get_llm_response(messages, max_tokens=200)
+        print(content)
+    except Exception as e:
+        print(f"请求出错: {e}")

src/parse_json.py ADDED Viewed

	@@ -0,0 +1,68 @@

+import json
+def markdown_to_json(markdown_str):
+    # 移除Markdown语法中可能存在的标记，如代码块标记等
+    markdown_str = markdown_str.strip()
+    if markdown_str.startswith("```json"):
+        markdown_str = markdown_str[7:-3].strip()
+    elif markdown_str.startswith("```"):
+        markdown_str = markdown_str[3:-3].strip()
+    if markdown_str.endswith("```"):
+        markdown_str = markdown_str[:-3].strip()
+    # print(markdown_str)
+    # 将字符串转换为JSON字典
+    json_dict = json.loads(markdown_str)
+    return json_dict
+def parse_json(json_str, forced_keywords= ["character_response"]):
+    try:
+        return markdown_to_json(json_str)
+    except:
+        try:
+            return forced_extract(json_str, forced_keywords)
+        except:
+            return {}
+import re
+def forced_extract(input_str, keywords):
+    result = {key: "" for key in keywords}
+    for key in keywords:
+        # 使用正则表达式来查找关键词-值对
+        pattern = f'"{key}":\s*"(.*?)"'
+        match = re.search(pattern, input_str)
+        if match:
+            result[key] = match.group(1)
+    return result
+if __name__ == "__main__":
+    input_str = """```json
+{
+  "item_name": "手机",
+  "analysis": "在剧情中，手机作为一个可能的线索，可能会含有凶手的通讯记录或者与受害者最后的联系信息。队长李伟会指示队员们检查手机，以寻找可能的线索，如通话记录、短信、社
+交媒体应用等。",
+  "echo": "我认为在剧情设定的人物眼里，看到物品 手机时，会说",
+  "character_response": "队长李伟可能会说：'这手机可能是死者最后的通讯工具，检查一下有没有未接电话或者最近的通话记录，看看能否找到凶手的线索。'"
+}
+```"""
+    print(parse_json(input_str))
+#     center_str = """{
+#   "item_name": "手机",
+#   "analysis": "在剧情中，手机作为一个可能的线索，可能会含有凶手的通讯记录或者与受害者最后的联系信息。队长李伟会指示队员们检查手机，以寻找可能的线索，如通话记录、短信、社
+# 交媒体应用等。",
+#   "echo": "我认为在剧情设定的人物眼里，看到物品 手机时，会说",
+#   "character_response": "队长李伟可能会说：'这手机可能是死者最后的通讯工具，检查一下有没有未接电话或者最近的通话记录，看看能否找到凶手的线索。'"
+# }"""
+#     json_dict = json.loads(center_str)

src/recognize_from_image_glm.py ADDED Viewed

	@@ -0,0 +1,128 @@

+## 图片识别需求
+# 使用VLM的接口
+# 输入一个图片 输出一个物品名称
+import os
+import base64
+import yaml
+from dotenv import load_dotenv
+# from resize_img import resize_image,get_img_html
+from zhipuai import ZhipuAI
+from io import BytesIO
+def get_vlm_response_cot(resized_img, candidates, model_name="glm-4v-flash", max_tokens=-1):
+    buffered = BytesIO()
+    resized_img.save(buffered, format="JPEG")
+    img_base = base64.b64encode(buffered.getvalue()).decode('utf-8')
+    prompt = """请帮助我抽取图片中的主要物体，如果命中candidates中的物品，则按照candidates输出，否则，输出主要物品的名字
+candidates: {candidates}
+Let's think step by step and output in json format, 包括以下字段:
+- caption 详细描述图像
+- major_object 物品名称
+- echo 重复字符串: 我将检查candidates中的物品，如果major_object有同义词在candidates中，则修正为candidate对应的名字，不然则保留major_object
+- fixed_object_name: 检查candidates后修正（如果命中）的名词，如果不命中则重复输出major_object
+"""
+    final_prompt = prompt.format(candidates=candidates)
+    load_dotenv()
+    your_api_key = os.getenv('ZHIPU_API_KEY')
+    if not your_api_key:
+        raise ValueError("请在.env项目文件中设置ZHIPU_API_KEY环境变量")
+    client = ZhipuAI(api_key=your_api_key) # 填写您自己的APIKey
+    response = client.chat.completions.create(
+        model=model_name,  # 函数调用过程使用模型名称
+        messages=[
+        {
+            "role": "user",
+            "content": [
+            {
+                "type": "image_url",
+                "image_url": {
+                    "url": img_base
+                }
+            },
+            {
+                "type": "text",
+                "text": final_prompt
+            }
+            ]
+        }
+        ]
+    )
+    return response.choices[0].message.content
+def get_vlm_response(img_path, candidates, model_name="glm-4v-flash", max_tokens=2048):
+    # img_path = r"asset\images\会员登记表.jpg"
+    # img_path = r"asset\images\前台工作人员.jpg"
+    # img_path= r"asset\images\烟头.jpg"
+    with open(img_path, 'rb') as img_file:
+        img_base = base64.b64encode(img_file.read()).decode('utf-8')
+    prompt = """你是一个物品分类器。请根据图片内容，从以下候选列表中选择最匹配的一项作为分类结果。
+            候选列表：{candidates}。请直接输出分类结果，不要包含任何其他描述或解释。"""
+    # 使用f-string格式化prompt
+    final_prompt = prompt.format(candidates=candidates)
+    load_dotenv()
+    your_api_key = os.getenv('ZHIPU_API_KEY')
+    if not your_api_key:
+        raise ValueError("请在.env项目文件中设置ZHIPU_API_KEY环境变量")
+    client = ZhipuAI(api_key=your_api_key) # 填写您自己的APIKey
+    response = client.chat.completions.create(
+        model=model_name,  # 函数调用过程使用模型名称
+        messages=[
+        {
+            "role": "user",
+            "content": [
+            {
+                "type": "image_url",
+                "image_url": {
+                    "url": img_base
+                }
+            },
+            {
+                "type": "text",
+                "text": final_prompt
+            }
+            ]
+        }
+        ]
+    )
+    return response.choices[0].message.content
+# # 测试，输出结果
+if __name__ == "__main__":
+    # img_path = r"asset\images\会员登记表.jpg"
+    # img_path = r"asset\images\前台工作人员.jpg"
+    img_paths= [r"asset\images\会员登记表.jpg", r"asset\images\手机.jpg", r"asset\images\烟头.jpg"]
+    candidates = ['人脸', '会员卡', '双节棍', '会员登记表']
+    from PIL import Image
+    def resize_image(img_path, max_height=200):
+        img = Image.open(img_path)
+        w, h = img.size
+        new_h = min(h, max_height)
+        new_w = int(w * (new_h / h))
+        img = img.resize((new_w, new_h))
+        return img
+    for img_path in img_paths:
+        print(f"图片路径: {img_path}")
+        resized_img = resize_image(img_path)
+        response = get_vlm_response_cot(resized_img, candidates)
+        print(response)

src/resize_img.py ADDED Viewed

	@@ -0,0 +1,22 @@

+from PIL import Image
+import base64
+from io import BytesIO
+def resize_image(img_path, max_height=200):
+    img = Image.open(img_path)
+    w, h = img.size
+    new_h = min(h, max_height)
+    new_w = int(w * (new_h / h))
+    img = img.resize((new_w, new_h))
+    return img
+def get_img_html(resized_img):
+    buffered = BytesIO()
+    resized_img.save(buffered, format="PNG")  # 也可以改成 JPEG
+    img_base64 = base64.b64encode(buffered.getvalue()).decode("utf-8")
+    # return img_base64
+    max_height = 200
+    # 用 HTML 或 Markdown 插入图片（固定高度）
+    img_html = f'<img src="data:image/png;base64,{img_base64}" style="max-height:{max_height}px; width:auto;">'
+    return img_html

test/0630discuss_prompt.txt ADDED Viewed

	@@ -0,0 +1,18 @@

+查看我的gradio代码 code中的内容
+我前端的部分都在gradio中实现，后端在python使用了一个gamemaster进行实现
+问题是 现在gamemaster有状态记忆，当然我可以把这个状态变成string
+但是这个状态在前端反复来回发送会降低gradio的效率
+目前的实现多个用户同时使用的时候会发生串扰
+我想问如果我想使用一个合理的前端框架，使得多个用户使用的时候，每个用户会有自己的gamemaster实例
+要怎么实现，考虑两个分支场景
+1.我有自己的域名和服务器
+2.我需要把这个服务部署在hugging face或者modelscope的页面上
+<code>
+</code>

test/gradio_interface.py ADDED Viewed

	@@ -0,0 +1,207 @@

+import gradio as gr
+from src.GameMaster import GameMaster
+import os
+from src.resize_img import resize_image, get_img_html
+import base64
+from io import BytesIO
+from src.fishTTS import get_audio
+yaml_path = "config/police.yaml"
+game_master = GameMaster( yaml_path )
+item_str_list = game_master.get_item_names()
+welcome_info = game_master.get_welcome_info()
+def callback_generate_audio( chatbot ):
+    if len(chatbot) == 0:
+        return None
+    response_message = chatbot[-1][1]
+    audio_path = get_audio(response_message)
+    return gr.update( value = audio_path , autoplay = True )
+def chat_submit_callback(user_message, chat_history):
+    # 调用GameMaster的submit_chat方法获取用户消息和回复
+    user_input, bot_response = game_master.submit_chat(user_message)
+    # 将对话记录添加到聊天历史
+    chat_history.append((user_input, bot_response))
+    return chat_history, ""
+# 修改chat_submit_callback或新增item_submit_callback
+def item_submit_callback(item_name, chat_history):
+    # print( item_name )
+    if not item_name.strip() or item_name.strip() == "":
+        return chat_history, ""
+    # 调用GameMaster的submit_item方法获取用户消息和回复
+    user_info, response_info = game_master.submit_item(item_name)
+    img_path = game_master.name2img_path(item_name)
+    if img_path is not None and os.path.exists(img_path):
+        # print(img_path)
+        # user_info = gr.Image(resize_image(img_path,max_height=200))
+        resized_img = resize_image(img_path, max_height=200)
+        img_html = get_img_html(resized_img)
+        user_info = gr.HTML(img_html)
+    # 将对话记录添加到聊天历史
+    chat_history.append((user_info, response_info))
+    return chat_history, ""
+def img_submit_callback( image_input, chatbot):
+    '''
+    参考item_submit_callback
+    这里会先把图片resize到max_height = 200
+    response = ”你上传了一张图片 我们正在编写VLM识别图片的功能“
+    然后增加信息(resized_img, resposne)
+    '''
+    if image_input:
+        resized_img_to_rec = resize_image(image_input, max_height=400)
+        resized_img = resize_image(image_input, max_height=200)
+        img_html = get_img_html(resized_img)
+        # response = "你上传了一张图片 我们正在编写VLM识别图片的功能"
+        user_info, response = game_master.submit_image(resized_img_to_rec)
+        chatbot.append((gr.HTML(img_html), response))
+    return chatbot
+def update_status_show():
+    current_status = game_master.get_status()
+    return current_status
+css = """
+.chatbot img {
+    max-height: 200px !important;
+    width: auto !important;
+}"""
+with gr.Blocks(title="鲸娱秘境-Intel参赛", css=css) as demo:
+    with gr.Tabs() as tabs:
+        with gr.TabItem("demo"):
+            gr.Markdown("# 鲸娱秘境-英特尔人工智能创新应用")
+            gr.Markdown('欢迎大家在点评搜索"鲸娱秘境",线上demo为游戏环节一部分，并加入多模态元素')
+            with gr.Row():
+                # 左侧ChatBox列
+                with gr.Column(scale=2):  # 占2份宽度
+                    chatbot = gr.Chatbot(label="对话窗口", height=800, value=[(None, welcome_info)])
+                    # 添加聊天输入框和发送按钮
+                    user_input = gr.Textbox(label="输入消息", placeholder="请输入您的消息...", interactive=True)
+                    send_btn = gr.Button("发送", variant="primary")
+                # 右侧操作列
+                with gr.Column(scale=1):  # 占1份宽度
+                    # 第二行：单选框 + 提交按钮
+                    with gr.Row():  # 让单选框和按钮在同一行
+                        radio_choices = gr.Radio(label="向NPC提交场景中的物品", choices= item_str_list,
+                                              value="生成描述", interactive=True)
+                    with gr.Row():
+                        item_submit_btn = gr.Button("提交场景内的物品", variant="primary")
+                    # 第一行：图片上传（支持文件和摄像头）
+                    image_input = gr.Image(type="filepath", label="上传图片")
+                    with gr.Row():  # 让单选框和按钮在同一行
+                        img_submit_btn = gr.Button("提交图片中的物品", variant="primary")
+                    with gr.Row():
+                        reload_btn = gr.Button("重置剧情", variant="primary")
+                    with gr.Row():
+                        audio_player = gr.Audio()
+                    # 创建一个折叠面板，初始状态为关闭
+                    with gr.Accordion("For debug", open=False):
+                        with gr.Row():
+                            item_text_to_submit = gr.Textbox(label="直接输入物品名", value="", interactive=True, scale = 20)
+                            item_text_submit_btn = gr.Button("提交", variant="primary", scale = 1)
+                        current_status = game_master.get_status()
+                        status_display = gr.Textbox(label="agent状态显示", value= current_status,
+                                                interactive=False, max_lines=3)
+            # 添加消息处理函数
+            def send_message(user_message, chat_history):
+                if user_message.strip():
+                    chat_history.append((user_message, "正在处理您的请求..."))
+                    return "", chat_history
+                return user_message, chat_history
+            # 绑定事件处理
+            send_btn.click(chat_submit_callback, [user_input, chatbot], [chatbot, user_input])
+            user_input.submit(chat_submit_callback, [user_input, chatbot], [chatbot, user_input])
+            img_submit_btn.click(
+                fn = img_submit_callback,
+                inputs = [image_input, chatbot],
+                outputs = [chatbot]
+            ).then(
+                fn=update_status_show,
+                inputs=[],
+                outputs=[status_display]
+            ).then(
+                fn = callback_generate_audio,
+                inputs=[ chatbot],
+                outputs=[audio_player]
+            )
+            # 绑定物品提交按钮事件
+            item_submit_btn.click(
+                fn=item_submit_callback,
+                inputs=[radio_choices, chatbot],
+                outputs=[chatbot, radio_choices]
+            ).then(
+                fn=update_status_show,
+                inputs=[],
+                outputs=[status_display]
+            ).then(
+                fn=callback_generate_audio,
+                inputs = [chatbot],
+                outputs = [audio_player]
+            )
+            item_text_submit_btn.click(
+                fn=item_submit_callback,
+                inputs=[item_text_to_submit, chatbot],
+                outputs=[chatbot, item_text_to_submit]
+            ).then(
+                fn=update_status_show,
+                inputs=[],
+                outputs=[status_display]
+            ).then(
+                fn=callback_generate_audio,
+                inputs = [chatbot],
+                outputs = [audio_player]
+            )
+            # 绑定刷新按钮事件
+            def reload_game():
+                global game_master
+                game_master = GameMaster(yaml_path)
+                return [(None,game_master.get_welcome_info())], game_master.get_status()
+            reload_btn.click(
+                fn=reload_game,
+                inputs=[],
+                outputs=[chatbot, status_display]
+            )
+        with gr.TabItem("Readme"):
+            with open("demo_info.md", "r", encoding="utf-8") as f:
+                readme_content = f.read()
+            gr.Markdown(readme_content)
+if __name__ == "__main__":
+    demo.launch(share=True)

test/pyproject.toml ADDED Viewed

	@@ -0,0 +1,15 @@

+[project]
+name = "larp-vlm"
+version = "0.1.0"
+description = "Add your description here"
+readme = "README.md"
+requires-python = ">=3.11"
+dependencies = [
+    "dotenv>=0.9.9",
+    "gradio>=5.35.0",
+    "openai>=1.93.0",
+    "pillow>=9.0.0",
+    "python-dotenv>=1.0.0",
+    "pyyaml>=6.0",
+    "zhipuai>=2.0.0",
+]

test/test_gradio_state.py ADDED Viewed

	@@ -0,0 +1,16 @@

+import gradio as gr
+with gr.Blocks() as demo:
+    cart = gr.State([])
+    items_to_add = gr.CheckboxGroup(["Cereal", "Milk", "Orange Juice", "Water"])
+    def add_items(new_items, previous_cart):
+        cart = previous_cart + new_items
+        return cart
+    gr.Button("Add Items").click(add_items, [items_to_add, cart], cart)
+    cart_size = gr.Number(label="Cart Size")
+    cart.change(lambda cart: len(cart), cart, cart_size)
+demo.launch()

test/test_playground.py ADDED Viewed

	@@ -0,0 +1,28 @@

+from src.GameMaster import GameMaster
+from src.resize_img import resize_image
+yaml_path = "config/police.yaml"
+gm = GameMaster(yaml_path)
+print("GameMaster初始化成功！")
+print("Prompt steps:", gm.prompt_steps)
+print("Items:", gm.items)
+print(gm.name2img_path('双节棍'))
+print("===Response to 手机===")
+print(gm.generate_item_response("手机"))
+print("-----")
+# image_name = "asset/images/会员卡.jpg"
+# image_name = "asset/images/会员登记表.jpg"
+image_name = "asset/images/前台工作人员.jpg"
+# 参考 resized_img 读取image_name, 并且resize到max_height = 200
+resized_img = resize_image(image_name, max_height=200)
+# 只上传图片的相对路径
+response = gm.extract_object_from_image(resized_img)
+print("用户上传的图片识别为:")
+print(response)

test/test_vlm.py ADDED Viewed

	@@ -0,0 +1,61 @@

+## 图片识别需求
+# 使用VLM的接口
+# 输入一个图片 输出一个物品名称
+import os
+import base64
+import yaml
+from dotenv import load_dotenv
+# from resize_img import resize_image,get_img_html
+from zhipuai import ZhipuAI
+def get_vlm_response(img_path, candidates, model_name="glm-4v-flash", max_tokens=2048):
+    # img_path = r"asset\images\会员登记表.jpg"
+    # img_path = r"asset\images\前台工作人员.jpg"
+    # img_path= r"asset\images\烟头.jpg"
+    with open(img_path, 'rb') as img_file:
+        img_base = base64.b64encode(img_file.read()).decode('utf-8')
+    prompt = """你是一个物品分类器。请根据图片内容，从以下候选列表中选择最匹配的一项作为分类结果。
+            候选列表：{candidates}。请直接输出分类结果，不要包含任何其他描述或解释。"""
+    # 使用f-string格式化prompt
+    final_prompt = prompt.format(candidates=candidates)
+    load_dotenv()
+    your_api_key = os.getenv('ZHIPU_API_KEY')
+    if not your_api_key:
+        raise ValueError("请在.env项目文件中设置ZHIPU_API_KEY环境变量")
+    client = ZhipuAI(api_key=your_api_key) # 填写您自己的APIKey
+    response = client.chat.completions.create(
+        model=model_name,  # 函数调用过程使用模型名称
+        messages=[
+        {
+            "role": "user",
+            "content": [
+            {
+                "type": "image_url",
+                "image_url": {
+                    "url": img_base
+                }
+            },
+            {
+                "type": "text",
+                "text": final_prompt
+            }
+            ]
+        }
+        ]
+    )
+    return response.choices[0].message.content
+# # 测试，输出结果
+if __name__ == "__main__":
+    img_path = r"asset\images\会员登记表.jpg"
+    # img_path = r"asset\images\前台工作人员.jpg"
+    candidates = ['人脸', '会员卡', '双节棍', '会员登记表']
+    response = get_vlm_response(img_path, candidates)
+    print(response)

test/trans_image2html.py ADDED Viewed

	@@ -0,0 +1,14 @@

+import sys
+import os
+sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+from src.resize_img import resize_image, get_img_html
+if __name__ == "__main__":
+    image_indices = [1, 2, 3]
+    for i in image_indices:
+        img_path = "asset/images/鲸娱秘境{}.jpg".format(i)
+        resized_img = resize_image(img_path, max_height=300)
+        img_html = get_img_html(resized_img).replace('200px', '300px')
+        with open(os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "demo_info.md"), "a", encoding="utf-8") as f:
+            f.write(img_html + "\n")

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff