ParulPandey commited on
Commit
5e13020
·
verified ·
1 Parent(s): 19fe748

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -16
README.md CHANGED
@@ -22,30 +22,28 @@ With ReadRight, students can read short, age-appropriate stories generated just
22
  This project is an MVP (Minimum Viable Product) submitted for **Track 3: Agentic Demo Showcase** at the hackathon, showcasing the power of AI agents through a multi-component architecture built with Gradio, smolagents, and advanced AI services.
23
 
24
  ## 🎯 Motivation
25
- Non-native English speakers often struggle with pronunciation due to the language’s tricky phonetics (e.g., "knight," "through," "psychology"). Many lack access to affordable, patient, and judgment-free practice environments, leading to embarrassment and reluctance to speak. ReadRight solves this by offering personalized, AI-driven reading practice that adapts to each student’s level and interests, fostering confidence and fluency.
26
 
27
- ## 🛠️ Technical Architecture
28
- ReadRight leverages a modular architecture integrated via Gradio Spaces API, utilizing AI agents powered by smolagents in two distinct phases to provide a dynamic, adaptive learning experience. The system employs a multi-step agentic workflow where the LLM’s outputs control the program’s flow, particularly in adapting content based on student performance.
29
 
30
- 1. **Content and Audio Generation**:
31
- - A Gradio interface collects student details (name, grade, topic).
32
- - **Story Generation Agent**: Using Google Gemini and smolagents’ tool-calling capabilities, this agent autonomously generates engaging, personalized stories tailored to the student’s grade and interests. The agent dynamically adjusts story length, vocabulary, and complexity based on the student’s grade level, ensuring age-appropriate content without explicit user instruction beyond initial inputs. For example, it selects simpler words for younger students or more complex sentences for older ones, making decisions on content structure internally.
33
- - **Audio Synthesis**: Hugging Face TTS (NihalGazi/Text-To-Speech-Unlimited) converts stories into natural-sounding audio for pronunciation guidance. This phase uses LLM outputs as a processor, initiated by user input, but the story generation agent exhibits autonomy in crafting tailored content.
34
 
35
- 2. **Adaptive Feedback and Learning**:
36
- - After the student records their reading, the system activates a multi-step agentic workflow powered by smolagents:
37
- - **Speech Recognition**: Whisper Large V2 (abidlabs/whisper-large-v2) transcribes student recordings accurately.
38
- - **Text Comparison**: A custom Python engine with `difflib` compares the transcription to the original text, identifying errors and mispronunciations.
39
- - **Feedback Generation Agent**: Leveraging smolagents, the LLM generates detailed, encouraging feedback with pronunciation tips, adapting to the student’s performance. It autonomously decides to generate new stories incorporating previously missed or mispronounced words, based on feedback analysis, forming a loop where the LLM determines the next action to address learning gaps.
40
- - This phase operates as a multi-step agent, where the LLM controls iteration and program continuation by analyzing performance data and adapting content without explicit user instruction, creating a tailored learning path.
41
 
42
- The combination of these phases creates a semi-autonomous system: the content generation phase relies on user-initiated inputs but includes an agentic story generation process, while the adaptive feedback phase demonstrates stronger agentic behavior by dynamically adjusting to student needs, making ReadRight a powerful example of AI-driven educational support powered by smolagents.
43
 
44
  ## 🎥 Demo
45
  📺 [Watch the ReadRight Demo Video](#) *(Link to be added)*
46
 
47
- ---
48
 
49
 
50
- Let’s make reading practice accessible and fun for every student! 🌍📚
51
 
 
22
  This project is an MVP (Minimum Viable Product) submitted for **Track 3: Agentic Demo Showcase** at the hackathon, showcasing the power of AI agents through a multi-component architecture built with Gradio, smolagents, and advanced AI services.
23
 
24
  ## 🎯 Motivation
25
+ English pronunciation can be tough for non-native speakers—words like "knight," "through," or "psychology" don’t sound how they look. Many students feel shy about speaking because they lack a safe, affordable space to practice without judgment. ReadRight tackles this by offering personalized, AI-powered reading practice that adapts to each student’s level and interests, helping them gain confidence and fluency.
26
 
27
+ ## 🛠️ How It Works
28
+ ReadRight uses a modular setup with Gradio Spaces API and smolagents to deliver a dynamic, adaptive learning experience. It relies on AI agents in two key phases, with the LLM driving the flow in a multi-step agentic workflow, especially when adapting to student progress.
29
 
30
+ 1. **Story Creation and Audio**:
31
+ - Students enter their name, grade, and a topic they like through a simple Gradio interface.
32
+ - **Story Generation Agent**: Powered by Google Gemini and smolagents’ tool-calling features, this agent crafts engaging, personalized stories tailored to the student’s grade and interests. It automatically adjusts story length, vocabulary, and complexity—for example, using simple words for younger kids or richer sentences for older onesmaking smart choices about content without needing extra user input.
33
+ - **Audio Creation**: Hugging Face TTS (NihalGazi/Text-To-Speech-Unlimited) turns stories into natural-sounding audio to guide pronunciation. This phase starts with user input, but the story agent works autonomously to shape the content.
34
 
35
+ 2. **Feedback and Adaptive Learning**:
36
+ - Once a student records their reading, a multi-step agentic workflow kicks in, powered by smolagents:
37
+ - **Speech Recognition**: Whisper Large V2 (abidlabs/whisper-large-v2) transcribes the student’s reading accurately.
38
+ - **Text Comparison**: A custom Python engine using `difflib` compares the transcription to the original story, spotting errors and mispronunciations.
39
+ - **Feedback Agent**: The LLM generates friendly, detailed feedback with pronunciation tips tailored to the student’s performance. It also decides when to create new stories that focus on words the student found tricky, forming a loop where it adapts content based on performance without needing extra user prompts.
40
+ - This phase shines as a multi-step agent, with the LLM analyzing data and choosing next steps to create a personalized learning path.
41
 
42
+ Together, these phases form a semi-autonomous system: story creation starts with user input but uses an agent to craft tailored content, while the feedback phase is highly agentic, dynamically adjusting to each student’s needs. This makes ReadRight a strong showcase of AI-driven education with smolagents.
43
 
44
  ## 🎥 Demo
45
  📺 [Watch the ReadRight Demo Video](#) *(Link to be added)*
46
 
 
47
 
48
 
 
49