Qwen/Qwen2.5-Omni-7B · Russian language support

28 days ago

is Russian language support expected in terms of input voice and output?

26 days ago

yes i think there will be support for any langage as the model will be released in the transformer library !soon ! ( i hope )
The docs will show you to use a whisper model to create a new version of the model as the audio component , and a sigclip library as your image library !
so you should be able to construct and train a new model , from pretrained components !
as wel as use a deepseek model for your llm !

Wave2Vec and Whisper are the audio components and they are availabel in various languages and speakers !
Whsiper is also trainable @!

For me i wouilid create a new model from pretrained components ( russian ) and then merge them !
hopefully maintaining both languages in the ne model ?

or extract the whsiper components from the model and merge my whispers 1 as deepseek i think is multilingual !

For chinese models generally they are English / Chinese ! So they can be trained on both languages ! , this does not restrict the model to english as a sole european language as most latin based languages can be easily added : for non stanrds alphabeths or unicode , you would need to have a tokenizer for your model which is multilingual as well as contains all the characters for the target language !

for quick start i would try the constructing the model with the russian pretrained components and then use unsloth to train the vision / llm components ... then use the trl library to train the audio compnent ! Phew ! .... Hopefully unsloth will create a trainer for this model !!! as i do think this model is the prime base model ! , It does oly generate speech and text not any tho !

but we have diffusers for generation ! ( very good ones ! but too large !) So i would hope that image generation will begin to use reverse image processing .... ie convert the image to base64 train for catpiions then train for retriveal of the same base64 images ! ... ( i did this and it was sucessfull ) .... IT may not be diffusing (fuzzy logic) but its a lightweight image generation !

Potentially Perhaps ?

alexdekan030

5 days ago

That's crazy LeroyDyer, LOL