Smoliakov PRO
Yehor
AI & ML interests
Speech-to-Text, Text-to-Speech, Voice over Internet Protocol
Recent Activity
reacted
to
leonardlin's
post
with ๐
about 11 hours ago
Happy to announce the release of Shisa V2, our latest generation of our bilingual Japanese-English language models. After hundreds of ablations and months of work, we're releasing some of the strongest open Japanese models at 7B, 8B, 12B, 14B, 32B and 70B! Full announcement here https://shisa.ai/posts/shisa-v2/ or visit the Shisa V2 HF collection: https://huggingface.co/collections/shisa-ai/shisa-v2-67fc98ecaf940ad6c49f5689
updated
a model
about 21 hours ago
Yehor/w2v-bert-uk-v2.1_onnx-gpu_op14_fp32
published
a model
about 21 hours ago
Yehor/w2v-bert-uk-v2.1_onnx-gpu_op14_fp32
Organizations
Yehor's activity

reacted to
leonardlin's
post with ๐
about 11 hours ago
Post
939
Happy to announce the release of Shisa V2, our latest generation of our bilingual Japanese-English language models. After hundreds of ablations and months of work, we're releasing some of the strongest open Japanese models at 7B, 8B, 12B, 14B, 32B and 70B! Full announcement here https://shisa.ai/posts/shisa-v2/ or visit the Shisa V2 HF collection:
shisa-ai/shisa-v2-67fc98ecaf940ad6c49f5689

replied to
their
post
2 days ago
Also, tested it on A100 with TensorRT:
https://colab.research.google.com/drive/1-agoo5ll-hWEecWQAtO1FM39sqavJxph?usp=sharing
Results are not so obvious, but it works base_rfdetr_fp16.onnx model and gives ~10ms/img

posted
an
update
3 days ago
Post
2571
I have made a Rust project with integration of the latest state-of-the-art model for object detection, it outperforms YOLO!
Check it out: https://github.com/egorsmkv/rf-detr-usls
Check it out: https://github.com/egorsmkv/rf-detr-usls

replied to
their
post
7 days ago
This program does what datasets does. When you push dataset created by the audiofolder script, it creates parquet data and shard them internally.
So, you can use audios-to-dataset instead if you need faster speeds than datasets provides.

posted
an
update
7 days ago
Post
2047
Convert your audio data to Parquet/DuckDB files with blazingly fast speeds!
Repository with pre-built binaries: https://github.com/crs-org/audios-to-dataset
Repository with pre-built binaries: https://github.com/crs-org/audios-to-dataset

replied to
their
post
9 days ago
My channel in Telegram: https://t.me/doing_something

posted
an
update
9 days ago
Post
2229
Create spectrogram using Rust!
Slightly improved nice project that creates spectrogram and built binaries for different platform using cross-rs I've mentioned earlier in my channel.
Repo: https://github.com/crs-org/sonogram
Slightly improved nice project that creates spectrogram and built binaries for different platform using cross-rs I've mentioned earlier in my channel.
Repo: https://github.com/crs-org/sonogram

replied to
their
post
12 days ago

posted
an
update
12 days ago
Post
653
Added more built executables to extract-audio I've released recently.
See my previous post - https://huggingface.co/posts/Yehor/654118712490771
Repository: https://github.com/crs-org/extract-audio
See my previous post - https://huggingface.co/posts/Yehor/654118712490771
Repository: https://github.com/crs-org/extract-audio

posted
an
update
14 days ago
Post
1928
Made a simple Python script to generate Argilla project for audio annotation from a dataset:
https://github.com/egorsmkv/argilla-audio-annotation
https://github.com/egorsmkv/argilla-audio-annotation

posted
an
update
16 days ago
Post
2040
Are you interesting in different runtimes for AI models?
Check out IREE (iree.dev), it convert models to MLIR and then execute on different platforms.
I have tested it in Rust on CPU and CUDA: https://github.com/egorsmkv/eerie-yolo11
Check out IREE (iree.dev), it convert models to MLIR and then execute on different platforms.
I have tested it in Rust on CPU and CUDA: https://github.com/egorsmkv/eerie-yolo11

replied to
their
post
21 days ago

posted
an
update
21 days ago
Post
2224
Extract audio datasets with Rust on blazingly fast speeds!
With this tool you can extract audio files from a parquet or arrow file generated by Hugging Face datasets library.
Repository: https://github.com/egorsmkv/extract-audio
With this tool you can extract audio files from a parquet or arrow file generated by Hugging Face datasets library.
Repository: https://github.com/egorsmkv/extract-audio

posted
an
update
about 1 month ago
Post
617
If you spent a lot of time in Telegram, then use this bot to monitor state of your ML lab:
https://github.com/egorsmkv/gpu-state-tgbot
https://github.com/egorsmkv/gpu-state-tgbot

reacted to
eliebak's
post with ๐ฅ
about 1 month ago
Post
1650
Google just dropped an exciting technical report for the brand-new Gemma3 model! ๐ Here are my personal notes highlighting the most intriguing architectural innovations, design choices, and insights from this release:
1) Architecture choices:
> No more softcaping, replace by QK-Norm
> Both Pre AND Post Norm
> Wider MLP than Qwen2.5, ~ same depth
> SWA with 5:1 and 1024 (very small and cool ablation on the paper!)
> No MLA to save KV cache, SWA do the job!
2) Long context
> Only increase the rope in the global layer (to 1M)
> Confirmation that it's harder to do long context for smol models, no 128k for the 1B
> Pretrained with 32k context? seems very high
> No yarn nor llama3 like rope extension
3) Distillation
> Only keep te first 256 logits for the teacher
> Ablation on the teacher gap (tl;dr you need some "patience" to see that using a small teacher is better)
> On policy distillation yeahh (by
@agarwl_
et al), not sure if the teacher gap behave the same here, curious if someone have more info?
4) Others
> Checkpoint with QAT, that's very cool
> RL using improve version of BOND, WARM/WARP good excuse to look at
@ramealexandre
papers
> Only use Zero3, no TP/PP if i understand correctly ?
> Training budget relatively similar than gemma2
1) Architecture choices:
> No more softcaping, replace by QK-Norm
> Both Pre AND Post Norm
> Wider MLP than Qwen2.5, ~ same depth
> SWA with 5:1 and 1024 (very small and cool ablation on the paper!)
> No MLA to save KV cache, SWA do the job!
2) Long context
> Only increase the rope in the global layer (to 1M)
> Confirmation that it's harder to do long context for smol models, no 128k for the 1B
> Pretrained with 32k context? seems very high
> No yarn nor llama3 like rope extension
3) Distillation
> Only keep te first 256 logits for the teacher
> Ablation on the teacher gap (tl;dr you need some "patience" to see that using a small teacher is better)
> On policy distillation yeahh (by
@agarwl_
et al), not sure if the teacher gap behave the same here, curious if someone have more info?
4) Others
> Checkpoint with QAT, that's very cool
> RL using improve version of BOND, WARM/WARP good excuse to look at
@ramealexandre
papers
> Only use Zero3, no TP/PP if i understand correctly ?
> Training budget relatively similar than gemma2

reacted to
tomaarsen's
post with ๐
about 1 month ago
Post
6651
An assembly of 18 European companies, labs, and universities have banded together to launch ๐ช๐บ EuroBERT! It's a state-of-the-art multilingual encoder for 15 European languages, designed to be finetuned for retrieval, classification, etc.
๐ช๐บ 15 Languages: English, French, German, Spanish, Chinese, Italian, Russian, Polish, Portuguese, Japanese, Vietnamese, Dutch, Arabic, Turkish, Hindi
3๏ธโฃ 3 model sizes: 210M, 610M, and 2.1B parameters - very very useful sizes in my opinion
โก๏ธ Sequence length of 8192 tokens! Nice to see these higher sequence lengths for encoders becoming more common.
โ๏ธ Architecture based on Llama, but with bi-directional (non-causal) attention to turn it into an encoder. Flash Attention 2 is supported.
๐ฅ A new Pareto frontier (stronger *and* smaller) for multilingual encoder models
๐ Evaluated against mDeBERTa, mGTE, XLM-RoBERTa for Retrieval, Classification, and Regression (after finetuning for each task separately): EuroBERT punches way above its weight.
๐ Detailed paper with all details, incl. data: FineWeb for English and CulturaX for multilingual data, The Stack v2 and Proof-Pile-2 for code.
Check out the release blogpost here: https://huggingface.co/blog/EuroBERT/release
* EuroBERT/EuroBERT-210m
* EuroBERT/EuroBERT-610m
* EuroBERT/EuroBERT-2.1B
The next step is for researchers to build upon the 3 EuroBERT base models and publish strong retrieval, zero-shot classification, etc. models for all to use. I'm very much looking forward to it!
๐ช๐บ 15 Languages: English, French, German, Spanish, Chinese, Italian, Russian, Polish, Portuguese, Japanese, Vietnamese, Dutch, Arabic, Turkish, Hindi
3๏ธโฃ 3 model sizes: 210M, 610M, and 2.1B parameters - very very useful sizes in my opinion
โก๏ธ Sequence length of 8192 tokens! Nice to see these higher sequence lengths for encoders becoming more common.
โ๏ธ Architecture based on Llama, but with bi-directional (non-causal) attention to turn it into an encoder. Flash Attention 2 is supported.
๐ฅ A new Pareto frontier (stronger *and* smaller) for multilingual encoder models
๐ Evaluated against mDeBERTa, mGTE, XLM-RoBERTa for Retrieval, Classification, and Regression (after finetuning for each task separately): EuroBERT punches way above its weight.
๐ Detailed paper with all details, incl. data: FineWeb for English and CulturaX for multilingual data, The Stack v2 and Proof-Pile-2 for code.
Check out the release blogpost here: https://huggingface.co/blog/EuroBERT/release
* EuroBERT/EuroBERT-210m
* EuroBERT/EuroBERT-610m
* EuroBERT/EuroBERT-2.1B
The next step is for researchers to build upon the 3 EuroBERT base models and publish strong retrieval, zero-shot classification, etc. models for all to use. I'm very much looking forward to it!

replied to
their
post
about 1 month ago
Also, the Q&A dataset:

posted
an
update
about 1 month ago
Post
1511
Published some datasets for researchers in Ukrainian NLP from my project https://ua-lawyer.com (Q&A platform in Ukraine):
Datasets:
- ua-l/topics
- ua-l/topics-train-test
- ua-l/topics-text-label
Model:
- https://huggingface.co/ua-l/topics-classifier
Space:
- ua-l/topics-classifier-demo
Datasets:
- ua-l/topics
- ua-l/topics-train-test
- ua-l/topics-text-label
Model:
- https://huggingface.co/ua-l/topics-classifier
Space:
- ua-l/topics-classifier-demo

posted
an
update
about 1 month ago
Post
2874
Published a stable version of Ukrainian Text-to-Speech library on GitHub and PyPI.
Features:
- Multi-speaker model: 2 female (Tetiana, Lada) + 1 male (Mykyta) voices;
- Fine-grained control over speech parameters, including duration, fundamental frequency (F0), and energy;
- High-fidelity speech generation using the RAD-TTS++ acoustic model;
- Fast vocoding using Vocos;
- Synthesizes long sentences effectively;
- Supports a sampling rate of 44.1 kHz;
- Tested on Linux environments and Windows/WSL;
- Python API (requires Python 3.9 or later);
- CUDA-enabled for GPU acceleration.
Repository: https://github.com/egorsmkv/tts_uk
Features:
- Multi-speaker model: 2 female (Tetiana, Lada) + 1 male (Mykyta) voices;
- Fine-grained control over speech parameters, including duration, fundamental frequency (F0), and energy;
- High-fidelity speech generation using the RAD-TTS++ acoustic model;
- Fast vocoding using Vocos;
- Synthesizes long sentences effectively;
- Supports a sampling rate of 44.1 kHz;
- Tested on Linux environments and Windows/WSL;
- Python API (requires Python 3.9 or later);
- CUDA-enabled for GPU acceleration.
Repository: https://github.com/egorsmkv/tts_uk