Spaces:

mustafoyev202
/

uzbek_stt

Sleeping

App Files Files Community

mustafoyev202 commited on Feb 15

Commit

7b7a648

verified ·

1 Parent(s): 19cf637

Upload 4 files

Browse files

Files changed (4) hide show

.gitignore +176 -0
README.md +120 -14
model.py +209 -0
requirements.txt +6 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,176 @@

+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+.pybuilder/
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# IPython
+profile_default/
+ipython_config.py
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+# UV
+#   Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#uv.lock
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+#poetry.lock
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+#   in version control.
+#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
+.pdm.toml
+.pdm-python
+.pdm-build/
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+# SageMath parsed files
+*.sage.py
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Spyder project settings
+.spyderproject
+.spyproject
+# Rope project settings
+.ropeproject
+# mkdocs documentation
+/site
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+# Pyre type checker
+.pyre/
+# pytype static type analyzer
+.pytype/
+# Cython debug symbols
+cython_debug/
+# PyCharm
+#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+#  and can be added to the global gitignore or merged into this file.  For a more nuclear
+#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
+#.idea/
+# Ruff stuff:
+.ruff_cache/
+# PyPI configuration file
+.pypirc
+.env

README.md CHANGED Viewed

@@ -1,14 +1,120 @@
----
-title: Uzbek Stt
-emoji: 🏢
-colorFrom: pink
-colorTo: gray
-sdk: streamlit
-sdk_version: 1.42.0
-app_file: app.py
-pinned: false
-license: mit
-short_description: Uzbek Speech-to-Text
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Uzbek Speech-to-Text with Grammar Correction
+A powerful Speech-to-Text (STT) pipeline for the Uzbek language that combines state-of-the-art speech recognition with advanced grammar correction capabilities. Built with Wav2Vec2 and enhanced with Groq's LLM-powered grammar correction.
+## Features
+- High-accuracy Uzbek speech recognition using fine-tuned Wav2Vec2 model
+- Intelligent grammar correction using Groq's LLaMA 3.3 70B model
+- User-friendly Streamlit web interface
+- Support for multiple audio formats (WAV, MP3, M4A, OGG)
+- Robust error handling and logging
+- Easy-to-use API for integration into other projects
+## Installation
+1. Clone the repository:
+```bash
+git clone [your-repository-url]
+cd uzbek-stt
+```
+2. Install the required dependencies:
+```bash
+pip install -r requirements.txt
+```
+3. Set up your environment variables:
+```bash
+export GROQ_API_KEY="your-groq-api-key"
+```
+## Usage
+### Using the Streamlit Web Interface
+1. Start the Streamlit application:
+```bash
+streamlit run app.py
+```
+2. Open your web browser and navigate to the provided URL
+3. Upload an Uzbek audio file
+4. Click "Transcribe & Correct" to process the audio
+### Using the Python API
+```python
+from uzbek_stt import UzbekSTT
+# Initialize the pipeline
+stt = UzbekSTT()
+# Transcribe an audio file
+transcription = stt.transcribe("path/to/your/audio.wav")
+print(transcription)
+```
+## Requirements
+- Python 3.8+
+- PyTorch
+- Transformers
+- Librosa
+- Streamlit
+- LangChain
+- Groq API access
+## Model Details
+The pipeline uses two main components:
+1. **Speech Recognition**: Based on the `oyqiz/uzbek_stt` Wav2Vec2 model fine-tuned for Uzbek
+2. **Grammar Correction**: Powered by Groq's LLaMA 3.3 70B model with Uzbek language expertise
+## Environment Variables
+Required environment variables:
+- `GROQ_API_KEY`: Your Groq API key for accessing the LLM service
+## Error Handling
+The pipeline includes comprehensive error handling for:
+- Missing or invalid audio files
+- Model loading failures
+- Transcription errors
+- API communication issues
+- Invalid environment configurations
+## Logging
+Logging is configured to track:
+- Model initialization
+- Audio processing steps
+- Grammar correction progress
+- Error messages and stack traces
+## Contributing
+1. Fork the repository
+2. Create your feature branch (`git checkout -b feature/amazing-feature`)
+3. Commit your changes (`git commit -m 'Add some amazing feature'`)
+4. Push to the branch (`git push origin feature/amazing-feature`)
+5. Open a Pull Request
+## License
+[Your chosen license]
+## Acknowledgments
+- Thanks to the Wav2Vec2 team for the base model architecture
+- Groq for providing the LLM API access
+- Contributors to the Uzbek language model training data

model.py ADDED Viewed

	@@ -0,0 +1,209 @@

+import os
+import torch
+import logging
+import librosa
+from typing import Union, BinaryIO
+from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
+from langchain_groq import ChatGroq
+import streamlit as st
+from dotenv import load_dotenv
+load_dotenv()
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+class UzbekSTT:
+    """Enhanced Uzbek Speech-to-Text pipeline with grammar correction."""
+    # Set a class-level base model name
+    base_model_name = "oyqiz/uzbek_stt"
+    def __init__(self):
+        """Initialize the Uzbek STT pipeline with grammar correction."""
+        self.processor = None
+        self.model = None
+        self.groq_client = None
+        self.load_models()
+    def load_models(self) -> None:
+        """Load the base STT model and Groq client."""
+        try:
+            logger.info(f"Loading base Uzbek STT model: {self.base_model_name}")
+            self.processor = Wav2Vec2Processor.from_pretrained(self.base_model_name)
+            self.model = Wav2Vec2ForCTC.from_pretrained(self.base_model_name)
+            groq_api_key = os.getenv("GROQ_API_KEY")
+            if not groq_api_key:
+                raise ValueError("GROQ_API_KEY environment variable is required")
+            self.groq_client = ChatGroq(
+                model="llama-3.3-70b-versatile", temperature=0.3
+            )
+            logger.info("Models loaded successfully")
+        except Exception as e:
+            logger.error(f"Failed to initialize models: {str(e)}")
+            raise
+    def correct_grammar(self, text: str) -> str:
+        """Correct grammar in Uzbek text using Groq model."""
+        try:
+            messages = [
+                (
+                    "system",
+                    "Siz o'zbek tilida mutaxassissiz. Sizning vazifangiz berilgan o'zbek matnining grammatikasini to'g'rilash. Hech qanday izoh, tarjima yoki qo'shimcha ma'lumot bermang. Faqat to'g'rilangan o'zbek matnini qaytaring.",
+                ),
+                ("human", text),
+            ]
+            response = self.groq_client.invoke(messages)
+            return (
+                response.content.strip()
+                if hasattr(response, "content")
+                else str(response).strip()
+            )
+        except Exception as e:
+            logger.error(f"Grammar correction failed: {str(e)}")
+            return text
+    def transcribe(self, audio_file: Union[str, BinaryIO]) -> str:
+        """
+        Transcribe Uzbek speech to text with grammar correction.
+        Args:
+            audio_file: Path to audio file or file-like object
+        Returns:
+            str: Transcribed and grammar-corrected text
+        """
+        try:
+            # Validate and load audio
+            if isinstance(audio_file, str) and not os.path.exists(audio_file):
+                raise FileNotFoundError(f"Audio file not found: {audio_file}")
+            logger.info("Processing audio file...")
+            audio, _ = librosa.load(audio_file, sr=16000)
+            input_values = self.processor(
+                audio, return_tensors="pt", padding="longest", sampling_rate=16000
+            ).input_values
+            # Generate transcription
+            with torch.no_grad():
+                logits = self.model(input_values).logits
+                predicted_ids = torch.argmax(logits, dim=-1)
+            transcription = self.processor.batch_decode(predicted_ids)[0]
+            # Apply grammar correction
+            logger.info("Applying grammar correction...")
+            corrected_text = self.correct_grammar(transcription)
+            return corrected_text
+        except Exception as e:
+            logger.error(f"Transcription failed: {str(e)}")
+            raise
+    @classmethod
+    def from_pretrained(cls, model_name: str = "mustafoyev202/uzbek_stt"):
+        """Factory method for 🤗 Transformers compatibility."""
+        if model_name != "mustafoyev202/uzbek_stt":
+            logger.warning(
+                f"Using base model {cls.base_model_name} regardless of specified model name"
+            )
+        return cls()
+# ----------------- Streamlit App ----------------- #
+def main():
+    # Set Streamlit page configuration
+    st.set_page_config(
+        page_title="Uzbek STT with Grammar Correction",
+        page_icon="🗣️",
+        layout="centered",
+        initial_sidebar_state="auto",
+    )
+    # Inject custom CSS for a modern, beautiful design
+    st.markdown(
+        """
+        <style>
+        body {
+            background-color: #f0f2f6;
+        }
+        .main {
+            font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif;
+        }
+        .stButton>button {
+            background-color: #4CAF50;
+            color: white;
+            padding: 10px 24px;
+            border: none;
+            border-radius: 4px;
+            cursor: pointer;
+            font-size: 16px;
+        }
+        .stButton>button:hover {
+            background-color: #45a049;
+        }
+        .header {
+            text-align: center;
+            color: #2c3e50;
+            margin-bottom: 30px;
+        }
+        </style>
+        """,
+        unsafe_allow_html=True,
+    )
+    # App header
+    st.markdown(
+        "<h1 class='header'>🗣️ Uzbek Speech-to-Text & Grammar Correction</h1>",
+        unsafe_allow_html=True,
+    )
+    st.markdown(
+        """
+        Welcome to the **Uzbek STT** application, where cutting-edge technology meets
+        linguistic precision. Upload an Uzbek audio file, and let our model transcribe and
+        correct your text in real time!
+        """
+    )
+    # File uploader for audio files
+    uploaded_file = st.file_uploader(
+        "Upload your Uzbek audio file", type=["wav", "mp3", "m4a", "ogg"]
+    )
+    if uploaded_file is not None:
+        # Display an audio player for the uploaded file
+        st.audio(uploaded_file, format="audio/wav")
+        # Save the uploaded file to a temporary file
+        temp_audio_path = "temp_audio.wav"
+        with open(temp_audio_path, "wb") as f:
+            f.write(uploaded_file.read())
+        if st.button("Transcribe"):
+            with st.spinner("Processing your audio file..."):
+                try:
+                    # Initialize the UzbekSTT pipeline
+                    uzbek_stt = UzbekSTT()
+                    # Transcribe and correct the audio
+                    transcription = uzbek_stt.transcribe(temp_audio_path)
+                    st.success("Transcription complete!")
+                    st.markdown("### Transcribed Text:")
+                    st.write(transcription)
+                except Exception as e:
+                    st.error(f"An error occurred: {str(e)}")
+                finally:
+                    # Clean up the temporary audio file
+                    if os.path.exists(temp_audio_path):
+                        os.remove(temp_audio_path)
+if __name__ == "__main__":
+    main()

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+huggingface_hub
+torch
+transformers
+librosa
+langchain_groq
+python-dotenv