tokusan2
/

style-bert-vits2-jp

@@ -1,5 +1,5 @@
 ---
-title: Style-BERT-VITS2 Japanese TTS
 emoji: 🎤
 colorFrom: blue
 colorTo: purple
@@ -12,23 +12,30 @@ tags:
 - japanese
 - style-bert-vits2
 - inference-endpoints
 license: mit
 ---
-# Style-BERT-VITS2 Japanese Text-to-Speech
 日本語テキスト読み上げ用のStyle-BERT-VITS2モデルです。
-## 使用方法
-このモデルはHugging Face Inference Endpointsでの使用を想定しています。
 ### API Example
 ```python
 import requests
-url = "https://your-endpoint.hf.space"
 headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}
 data = {
@@ -36,12 +43,17 @@ data = {
     "parameters": {
         "speaker_id": 0,
         "emotion": "neutral",
-        "speed": 1.0
     }
 }
 response = requests.post(url, headers=headers, json=data)
 result = response.json()
 ```
 ## 機能
@@ -49,12 +61,40 @@ result = response.json()
 - 日本語テキスト読み上げ
 - 複数話者対応 (0-3)
 - 感情表現制御
-- 話速・ピッチ調整
 ## パラメータ
 - `speaker_id`: 話者ID (0-3)
 - `emotion`: 感情 (neutral, happy, sad, angry, etc.)
 - `speed`: 話速 (0.5-2.0)
-- `pitch`: ピッチ (-12.0 to 12.0)
 - `intonation`: イントネーション (0.0-2.0)

 ---
+title: Style-BERT-VITS2 Japanese TTS (Real Model)
 emoji: 🎤
 colorFrom: blue
 colorTo: purple
 - japanese
 - style-bert-vits2
 - inference-endpoints
+- real-model
 license: mit
 ---
+# Style-BERT-VITS2 Japanese Text-to-Speech (Real Model Integration)
 日本語テキスト読み上げ用のStyle-BERT-VITS2モデルです。
+実際の学習済みモデル（litagin/Style-Bert-VITS2-1.0-base）を統合しています。
+## 🆕 新機能
+- ✅ **実際のStyle-BERT-VITS2モデル統合**
+- ✅ **改良された音声波形生成**
+- ✅ **ピッチ・速度・音量制御**
+- ✅ **自動モデルダウンロード**
+## 使用方法
 ### API Example
 ```python
 import requests
+url = "https://j3meo1ty1iv2knlo.us-east-1.aws.endpoints.huggingface.cloud"
 headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}
 data = {
     "parameters": {
         "speaker_id": 0,
         "emotion": "neutral",
+        "speed": 1.0,
+        "pitch": 2.0,
+        "volume": 0.8
     }
 }
 response = requests.post(url, headers=headers, json=data)
 result = response.json()
+# Base64音声データを取得
+audio_base64 = result[0]["audio_base64"]
 ```
 ## 機能
 - 日本語テキスト読み上げ
 - 複数話者対応 (0-3)
 - 感情表現制御
+- 話速・ピッチ・音量調整
+- 自然な音声波形生成
 ## パラメータ
 - `speaker_id`: 話者ID (0-3)
 - `emotion`: 感情 (neutral, happy, sad, angry, etc.)
 - `speed`: 話速 (0.5-2.0)
+- `pitch`: ピッチ (-12.0 to 12.0 セミトーン)
+- `volume`: 音量 (0.0-2.0)
 - `intonation`: イントネーション (0.0-2.0)
+## 技術仕様
+- **ベースモデル**: litagin/Style-Bert-VITS2-1.0-base
+- **サンプリングレート**: 44.1kHz
+- **フォーマット**: WAV (16bit PCM)
+- **GPU加速**: NVIDIA L4
+- **自動スケーリング**: Scale-to-Zero対応
+## ログ情報
+レスポンスには以下の情報が含まれます：
+```json
+{
+    "audio_base64": "UklGRi4AAABXQVZFZm10...",
+    "sample_rate": 44100,
+    "duration": 2.5,
+    "model_info": {
+        "name": "Style-BERT-VITS2",
+        "version": "2.0-base-JP-Extra",
+        "model_loaded": true,
+        "device": "cuda"
+    }
+}
+```