karaokepoc (Karaoke POC)

Yehor

posted an update 2 months ago

Post

716

Esoteric practices: inference models in PHP!

Repository: https://github.com/egorsmkv/speech-to-text-using-php

Yehor

posted an update 2 months ago

Post

2431

Made a workable program that uses IREE runtime using Rust to inference wav2vec2-bert model for Automatic Speech Recognition.

1 reply

·

Yehor

posted an update 2 months ago

Post

2693

I have made a Rust project with integration of the latest state-of-the-art model for object detection, it outperforms YOLO!

Check it out: https://github.com/egorsmkv/rf-detr-usls

2 replies

·

Yehor

posted an update 2 months ago

Post

2104

Convert your audio data to Parquet/DuckDB files with blazingly fast speeds!

Repository with pre-built binaries: https://github.com/crs-org/audios-to-dataset

2 replies

·

Yehor

posted an update 3 months ago

Post

2257

Create spectrogram using Rust!

Slightly improved nice project that creates spectrogram and built binaries for different platform using cross-rs I've mentioned earlier in my channel.

Repo: https://github.com/crs-org/sonogram

1 reply

·

Yehor

posted an update 3 months ago

Post

668

Added more built executables to extract-audio I've released recently.

See my previous post - https://huggingface.co/posts/Yehor/654118712490771

Repository: https://github.com/crs-org/extract-audio

1 reply

·

Yehor

posted an update 3 months ago

Post

1946

Made a simple Python script to generate Argilla project for audio annotation from a dataset:

https://github.com/egorsmkv/argilla-audio-annotation

1 reply

·

Yehor

posted an update 3 months ago

Post

2052

Are you interesting in different runtimes for AI models?

Check out IREE (iree.dev), it convert models to MLIR and then execute on different platforms.

I have tested it in Rust on CPU and CUDA: https://github.com/egorsmkv/eerie-yolo11

Yehor

posted an update 3 months ago

Post

2235

Extract audio datasets with Rust on blazingly fast speeds!

With this tool you can extract audio files from a parquet or arrow file generated by Hugging Face datasets library.

Repository: https://github.com/egorsmkv/extract-audio

1 reply

·

Yehor

posted an update 3 months ago

Post

622

If you spent a lot of time in Telegram, then use this bot to monitor state of your ML lab:

https://github.com/egorsmkv/gpu-state-tgbot

Yehor

posted an update 3 months ago

Post

1515

Published some datasets for researchers in Ukrainian NLP from my project https://ua-lawyer.com (Q&A platform in Ukraine):

Datasets:
- ua-l/topics
- ua-l/topics-train-test
- ua-l/topics-text-label

Model:
- https://huggingface.co/ua-l/topics-classifier

Space:
- https://huggingface.co/spaces/ua-l/topics-classifier-demo

1 reply

·

Yehor

posted an update 4 months ago

Post

2884

Published a stable version of Ukrainian Text-to-Speech library on GitHub and PyPI.

Features:

- Multi-speaker model: 2 female (Tetiana, Lada) + 1 male (Mykyta) voices;
- Fine-grained control over speech parameters, including duration, fundamental frequency (F0), and energy;
- High-fidelity speech generation using the RAD-TTS++ acoustic model;
- Fast vocoding using Vocos;
- Synthesizes long sentences effectively;
- Supports a sampling rate of 44.1 kHz;
- Tested on Linux environments and Windows/WSL;
- Python API (requires Python 3.9 or later);
- CUDA-enabled for GPU acceleration.

Repository: https://github.com/egorsmkv/tts_uk

Yehor

posted an update 4 months ago

Post

634

Added Advanced options to RAD-TTS++ space, so you can synthesize Ukrainian voices precisely.

Space: https://huggingface.co/spaces/Yehor/radtts-uk-vocos-demo

5 replies

·

Karaoke POC

AI & ML interests

karaokepoc's activity

AI & ML interests

Team members 2

karaokepoc's activity