Borcherding commited on
Commit
c39b52e
ยท
verified ยท
1 Parent(s): d6ba6c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +108 -106
README.md CHANGED
@@ -1,106 +1,108 @@
1
- ---
2
- license: other
3
- license_name: coqui-public-model-license
4
- license_link: https://coqui.ai/cpml
5
- library_name: coqui
6
- pipeline_tag: text-to-speech
7
- widget:
8
- - text: "Once when I was six years old I saw a magnificent picture"
9
- ---
10
-
11
- # โ“TTS_v2 - CarliG Fine-Tuned Model
12
-
13
- This repository hosts a fine-tuned version of the โ“TTS model, utilizing 2 minutes of unique voice lines from AtheneLive's CarliG AI, the iconic GPT4 Chatbot who went viral after the release of gpt4 api. The voice lines were sourced from athenes live streams which can be found here:
14
- [AtheneLive George Carlin & CarliG livestream](https://www.youtube.com/watch?v=UMkZEQftZWA&t=5719s)
15
-
16
- ![CarliG](carli_avatar_head.png)
17
-
18
- Listen to a sample of the โ“TTS_v2 - CarliG Fine-Tuned Model:
19
-
20
- <audio controls>
21
- <source src="https://huggingface.co/Borcherding/XTTS-v2_CarliG/raw/main/sample_carlig_readme.wav" type="audio/wav">
22
- Your browser does not support the audio element.
23
- </audio>
24
-
25
- Here's a CarliG mp3 voice line clip from the training data:
26
-
27
- <audio controls>
28
- <source src="https://huggingface.co/Borcherding/XTTS-v2_CarliG/raw/main/reference.mp3" type="audio/wav">
29
- Your browser does not support the audio element.
30
- </audio>
31
-
32
- ## Features
33
- - ๐ŸŽ™๏ธ **Voice Cloning**: Realistic voice cloning with just a short audio clip.
34
- - ๐ŸŒ **Multi-Lingual Support**: Generates speech in 17 different languages while maintaining CarliG's distinct voice.
35
- - ๐Ÿ˜ƒ **Emotion & Style Transfer**: Captures the emotional tone and style of the original voice.
36
- - ๐Ÿ”„ **Cross-Language Cloning**: Maintains the unique voice characteristics across different languages.
37
- - ๐ŸŽง **High-Quality Audio**: Outputs at a 24kHz sampling rate for clear and high-fidelity audio.
38
-
39
- ## Supported Languages
40
- The model supports the following 17 languages: English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu), Korean (ko), and Hindi (hi).
41
-
42
- ## Usage in Roll Cage
43
- ๐Ÿค–๐Ÿ’ฌ Boost your AI experience with this Ollama add-on! Enjoy real-time audio ๐ŸŽ™๏ธ and text ๐Ÿ” chats, LaTeX rendering ๐Ÿ“œ, agent automations โš™๏ธ, workflows ๐Ÿ”„, text-to-image ๐Ÿ“โžก๏ธ๐Ÿ–ผ๏ธ, image-to-text ๐Ÿ–ผ๏ธโžก๏ธ๐Ÿ”ค, image-to-video ๐Ÿ–ผ๏ธโžก๏ธ๐ŸŽฅ transformations. Fine-tune text ๐Ÿ“, voice ๐Ÿ—ฃ๏ธ, and image ๐Ÿ–ผ๏ธ gens. Includes Windows macro controls ๐Ÿ–ฅ๏ธ and DuckDuckGo search.
44
-
45
- [ollama_agent_roll_cage (OARC)](https://github.com/Leoleojames1/ollama_agent_roll_cage) is a completely local Python & CMD toolset add-on for the Ollama command line interface. The OARC toolset automates the creation of agents, giving the user more control over the likely output. It provides SYSTEM prompt templates for each ./Modelfile, allowing users to design and deploy custom agents quickly. Users can select which local model file is used in agent construction with the desired system prompt.
46
-
47
- ## Why This Model for Roll Cage?
48
- The CarliG fine-tuned model was designed for the Roll Cage chatbot to enhance user interaction with a familiar and beloved voice. By incorporating CarliG's distinctive speech patterns and tone, Roll Cage becomes more engaging and entertaining. The addition of multi-lingual support and emotion transfer ensures that the chatbot can communicate effectively and expressively across different languages and contexts, providing a more immersive experience for users.
49
-
50
- ## CoquiTTS and Resources
51
- - ๐Ÿธ๐Ÿ’ฌ **CoquiTTS**: [Coqui TTS on GitHub](https://github.com/coqui-ai/TTS)
52
- - ๐Ÿ“š **Documentation**: [ReadTheDocs](https://tts.readthedocs.io/en/latest/)
53
- - ๐Ÿ‘ฉโ€๐Ÿ’ป **Questions**: [GitHub Discussions](https://github.com/coqui-ai/TTS/discussions)
54
- - ๐Ÿ—ฏ **Community**: [Discord](https://discord.gg/5eXr5seRrv)
55
-
56
- ## License
57
- This model is licensed under the [Coqui Public Model License](https://coqui.ai/cpml). Read more about the origin story of CPML [here](https://coqui.ai/blog/tts/cpml).
58
-
59
- ## Contact
60
- Join our ๐ŸธCommunity on [Discord](https://discord.gg/fBC58unbKE) and follow us on [Twitter](https://twitter.com/coqui_ai). For inquiries, email us at [email protected].
61
-
62
- Using ๐ŸธTTS API:
63
-
64
- ```python
65
- from TTS.api import TTS
66
- tts = TTS(model_path="D:/CodingGit_StorageHDD/Ollama_Custom_Mods/ollama_agent_roll_cage/AgentFiles/Ignored_TTS/XTTS-v2_CarliG/",
67
- config_path="D:/CodingGit_StorageHDD/Ollama_Custom_Mods/ollama_agent_roll_cage/AgentFiles/Ignored_TTS/XTTS-v2_CarliG/config.json", progress_bar=False, gpu=True).to(self.device)
68
-
69
- # generate speech by cloning a voice using default settings
70
- tts.tts_to_file(text="It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.",
71
- file_path="output.wav",
72
- speaker_wav="/path/to/target/speaker.wav",
73
- language="en")
74
-
75
- ```
76
-
77
- Using ๐ŸธTTS Command line:
78
-
79
- ```console
80
- tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 \
81
- --text "Bugรผn okula gitmek istemiyorum." \
82
- --speaker_wav /path/to/target/speaker.wav \
83
- --language_idx tr \
84
- --use_cuda true
85
- ```
86
-
87
- Using the model directly:
88
-
89
- ```python
90
- from TTS.tts.configs.xtts_config import XttsConfig
91
- from TTS.tts.models.xtts import Xtts
92
-
93
- config = XttsConfig()
94
- config.load_json("/path/to/xtts/config.json")
95
- model = Xtts.init_from_config(config)
96
- model.load_checkpoint(config, checkpoint_dir="/path/to/xtts/", eval=True)
97
- model.cuda()
98
-
99
- outputs = model.synthesize(
100
- "It took me quite a long time to develop a voice and now that I have it I am not going to be silent.",
101
- config,
102
- speaker_wav="/data/TTS-public/_refclips/3.wav",
103
- gpt_cond_len=3,
104
- language="en",
105
- )
106
- ```
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: coqui-public-model-license
4
+ license_link: https://coqui.ai/cpml
5
+ library_name: coqui
6
+ pipeline_tag: text-to-speech
7
+ widget:
8
+ - text: "Once when I was six years old I saw a magnificent picture"
9
+ ---
10
+
11
+ # โ“TTS_v2 - CarliG Fine-Tuned Model
12
+
13
+ This repository hosts a fine-tuned version of the โ“TTS model, utilizing 2 minutes of unique voice lines from AtheneLive's CarliG AI, the iconic GPT4 Chatbot who went viral after the release of gpt4 api. The voice lines were sourced from athenes live streams which can be found here:
14
+ [AtheneLive George Carlin & CarliG livestream](https://www.youtube.com/watch?v=UMkZEQftZWA&t=5719s)
15
+
16
+ ![CarliG](carli_avatar_head.png)
17
+
18
+ Listen to a sample of the โ“TTS_v2 - CarliG Fine-Tuned Model:
19
+
20
+ <audio controls>
21
+ <source src="https://huggingface.co/Borcherding/XTTS-v2_CarliG/raw/main/sample_carlig_readme.wav" type="audio/wav">
22
+ Your browser does not support the audio element.
23
+ </audio>
24
+
25
+ Here's a CarliG mp3 voice line clip from the training data:
26
+
27
+ <audio controls>
28
+ <source src="https://huggingface.co/Borcherding/XTTS-v2_CarliG/raw/main/reference.mp3" type="audio/wav">
29
+ Your browser does not support the audio element.
30
+ </audio>
31
+
32
+ ## Features
33
+ - ๐ŸŽ™๏ธ **Voice Cloning**: Realistic voice cloning with just a short audio clip.
34
+ - ๐ŸŒ **Multi-Lingual Support**: Generates speech in 17 different languages while maintaining CarliG's distinct voice.
35
+ - ๐Ÿ˜ƒ **Emotion & Style Transfer**: Captures the emotional tone and style of the original voice.
36
+ - ๐Ÿ”„ **Cross-Language Cloning**: Maintains the unique voice characteristics across different languages.
37
+ - ๐ŸŽง **High-Quality Audio**: Outputs at a 24kHz sampling rate for clear and high-fidelity audio.
38
+
39
+ ## Supported Languages
40
+ The model supports the following 17 languages: English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu), Korean (ko), and Hindi (hi).
41
+
42
+ ## Usage in Roll Cage
43
+ ๐Ÿค–๐Ÿ’ฌ Boost your AI experience with this Ollama add-on! Enjoy real-time audio ๐ŸŽ™๏ธ and text ๐Ÿ” chats, LaTeX rendering ๐Ÿ“œ, agent automations โš™๏ธ, workflows ๐Ÿ”„, text-to-image ๐Ÿ“โžก๏ธ๐Ÿ–ผ๏ธ, image-to-text ๐Ÿ–ผ๏ธโžก๏ธ๐Ÿ”ค, image-to-video ๐Ÿ–ผ๏ธโžก๏ธ๐ŸŽฅ transformations. Fine-tune text ๐Ÿ“, voice ๐Ÿ—ฃ๏ธ, and image ๐Ÿ–ผ๏ธ gens. Includes Windows macro controls ๐Ÿ–ฅ๏ธ and DuckDuckGo search.
44
+
45
+ [ollama_agent_roll_cage (OARC)](https://github.com/Leoleojames1/ollama_agent_roll_cage) is a completely local Python & CMD toolset add-on for the Ollama command line interface. The OARC toolset automates the creation of agents, giving the user more control over the likely output. It provides SYSTEM prompt templates for each ./Modelfile, allowing users to design and deploy custom agents quickly. Users can select which local model file is used in agent construction with the desired system prompt.
46
+
47
+ ## Why This Model for Roll Cage?
48
+ The CarliG fine-tuned model was designed for the Roll Cage chatbot to enhance user interaction with a familiar and beloved voice. By incorporating CarliG's distinctive speech patterns and tone, Roll Cage becomes more engaging and entertaining. The addition of multi-lingual support and emotion transfer ensures that the chatbot can communicate effectively and expressively across different languages and contexts, providing a more immersive experience for users.
49
+
50
+ The new fork of coqui is being upheld by idiap, god bless him:
51
+
52
+ ## CoquiTTS and Resources
53
+ - ๐Ÿธ๐Ÿ’ฌ **idiap/CoquiTTS**: [Coqui TTS on GitHub](https://github.com/idiap/coqui-ai-TTS?tab=readme-ov-file)
54
+ - ๐Ÿ“š **Documentation**: [ReadTheDocs](https://tts.readthedocs.io/en/latest/)
55
+ - ๐Ÿ‘ฉโ€๐Ÿ’ป **Questions**: [GitHub Discussions](https://github.com/coqui-ai/TTS/discussions)
56
+ - ๐Ÿ—ฏ **Community**: [Discord](https://discord.gg/5eXr5seRrv)
57
+
58
+ ## License
59
+ This model is licensed under the [Coqui Public Model License](https://coqui.ai/cpml). Read more about the origin story of CPML [here](https://coqui.ai/blog/tts/cpml).
60
+
61
+ ## Contact
62
+ Join our ๐ŸธCommunity on [Discord](https://discord.gg/fBC58unbKE) and follow us on [Twitter](https://twitter.com/coqui_ai). For inquiries, email us at [email protected].
63
+
64
+ Using ๐ŸธTTS API:
65
+
66
+ ```python
67
+ from TTS.api import TTS
68
+ tts = TTS(model_path="D:/CodingGit_StorageHDD/Ollama_Custom_Mods/ollama_agent_roll_cage/AgentFiles/Ignored_TTS/XTTS-v2_CarliG/",
69
+ config_path="D:/CodingGit_StorageHDD/Ollama_Custom_Mods/ollama_agent_roll_cage/AgentFiles/Ignored_TTS/XTTS-v2_CarliG/config.json", progress_bar=False, gpu=True).to(self.device)
70
+
71
+ # generate speech by cloning a voice using default settings
72
+ tts.tts_to_file(text="It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.",
73
+ file_path="output.wav",
74
+ speaker_wav="/path/to/target/speaker.wav",
75
+ language="en")
76
+
77
+ ```
78
+
79
+ Using ๐ŸธTTS Command line:
80
+
81
+ ```console
82
+ tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 \
83
+ --text "Bugรผn okula gitmek istemiyorum." \
84
+ --speaker_wav /path/to/target/speaker.wav \
85
+ --language_idx tr \
86
+ --use_cuda true
87
+ ```
88
+
89
+ Using the model directly:
90
+
91
+ ```python
92
+ from TTS.tts.configs.xtts_config import XttsConfig
93
+ from TTS.tts.models.xtts import Xtts
94
+
95
+ config = XttsConfig()
96
+ config.load_json("/path/to/xtts/config.json")
97
+ model = Xtts.init_from_config(config)
98
+ model.load_checkpoint(config, checkpoint_dir="/path/to/xtts/", eval=True)
99
+ model.cuda()
100
+
101
+ outputs = model.synthesize(
102
+ "It took me quite a long time to develop a voice and now that I have it I am not going to be silent.",
103
+ config,
104
+ speaker_wav="/data/TTS-public/_refclips/3.wav",
105
+ gpt_cond_len=3,
106
+ language="en",
107
+ )
108
+ ```