{ "cells": [ { "cell_type": "markdown", "source": [ "#Latviešu valodas runas atpazīšana\n", "\n", "Šajā bloknotā ir rīki latviešu valodas runas atpazīšanai. Tiek izmantots LU MII AiLab izveidotais [runas atpazīšanas modelis](https://huggingface.co/AiLab-IMCS-UL/whisper-large-v3-lv-late-cv17), kas veidots izmantojot [Balsu talkā](https://balsutalka.lv/) savāktos datus.\n", "\n", "Lai veiktu runas atpazīšanu audio failā sekojiet zemāk uzskaitītajiem soļiem." ], "metadata": { "id": "zZBBTnW-aThp" } }, { "cell_type": "markdown", "source": [ "##1. Nomainiet izpildlaika veidu uz T4 GPU\n", "\n", "Lai to izdarītu galvenajā izvēlnē šīs lapas augšpusē ejiet uz `Izpildlaiks` -> `Mainīt izpildlaika veidu` un izvēlieties `T4 GPU`\n", "\n", "" ], "metadata": { "id": "GALJps6fDlQD" } }, { "cell_type": "code", "source": [ "#@title 2. Nospiediet uz atskaņošanas pogas, lai ielādētu nepieciešamos rīkus.\n", "\n", "import ipywidgets as widgets\n", "from IPython.display import clear_output\n", "import torch\n", "from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline\n", "\n", "uploader = widgets.FileUpload(description='Izvēlieties audio', accept='audio/*', multiple=False)\n", "# uploader = widgets.FileUpload(multiple=False)\n", "display(uploader)" ], "metadata": { "id": "zLDYIFciCMTw", "cellView": "form" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "##3. Izvēlieties audio failu\n", "Pēc rīku ielādes parādīsies poga \"Izvēlieties audio\", nospiediet to un izvēlieties audio failu kurā atpazīt latviešu valodas runu." ], "metadata": { "id": "QxsuN4r0QGnl" } }, { "cell_type": "code", "source": [ "# @title 4. Palaidiet runas atpazīšanas procesu\n", "\n", "if len(uploader.data) == 0:\n", " display(widgets.HTML(\n", " value=\"<h3>Audio fails nav izvēlēts, lūdzu izvēlieties audio failu!</h3>\"\n", " ))\n", "else:\n", "\n", " display(widgets.HTML(\n", " value=\"<h3>Notiek ielāde...</h3>\"\n", " ))\n", "\n", " device = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n", " torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32\n", "\n", " model_id = \"AiLab-IMCS-UL/whisper-large-v3-lv-late-cv17\"\n", "\n", " model = AutoModelForSpeechSeq2Seq.from_pretrained(\n", " model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=False, use_safetensors=True\n", " ).to(device)\n", "\n", " processor = AutoProcessor.from_pretrained(model_id)\n", "\n", " pipe = pipeline(\n", " \"automatic-speech-recognition\",\n", " generate_kwargs={\"language\": \"latvian\", \"task\": \"transcribe\"},\n", " model=model,\n", " tokenizer=processor.tokenizer,\n", " feature_extractor=processor.feature_extractor,\n", " max_new_tokens=225,\n", " chunk_length_s=30,\n", " batch_size=16,\n", " return_timestamps=False,\n", " torch_dtype=torch_dtype,\n", " device=device,\n", " )\n", "\n", " clear_output()\n", "\n", " display(widgets.HTML(\n", " value=\"<h3>Notiek audio atpazīšana...</h3>\"\n", " ))\n", "\n", " result = pipe(uploader.data[0])\n", "\n", " with open('transcript.txt', 'w') as f:\n", " f.write(result[\"text\"])\n", "\n", " result_widget = widgets.HTML(\n", " value=result[\"text\"]\n", " )\n", "\n", " clear_output()\n", "\n", " display(widgets.HTML(\n", " value=result[\"text\"]\n", " ))" ], "metadata": { "cellView": "form", "id": "_6ovCwwqC6SM" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## 5. Saglabājiet audio failā atpazīto tekstu\n", "Pēc audio faila apstrādes tajā atpazītais teksts tiks izdrukāts zem ceturtā soļa šūnas, kā arī tas tiks saglabāts teksta failā `transcript.txt`. Lai apskatītu šo failu uzklikšķiniet uz mapītes ikonas ekrāna sānā.\n", "\n", "\n" ], "metadata": { "id": "PLqlD2N7STNq" } } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }