SaritMeshesha's picture
Upload 2 files
3e090a6 verified
---
title: LangGraph Data Analyst Agent
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: "1.28.0"
app_file: app.py
pinned: false
license: mit
---
# πŸ€– LangGraph Data Analyst Agent
An intelligent data analyst agent built with LangGraph that analyzes customer support conversations with advanced memory, conversation persistence, and query recommendations.
## 🌟 Features
### Core Functionality
- **Multi-Agent Architecture**: Separate specialized agents for structured and unstructured queries
- **Query Classification**: Automatic routing to appropriate agent based on query type
- **Rich Tool Set**: Comprehensive tools for data analysis and insights
### Advanced Memory & Persistence
- **Session Management**: Persistent conversations across page reloads and browser sessions
- **User Profile Tracking**: Agent learns and remembers user interests and preferences
- **Conversation History**: Full context retention using LangGraph checkpointers
- **Cross-Session Continuity**: Resume conversations using session IDs
### Intelligent Recommendations
- **Query Suggestions**: AI-powered recommendations based on conversation history
- **Interactive Refinement**: Collaborative query building with the agent
- **Context-Aware**: Suggestions based on user profile and previous interactions
## πŸ—οΈ Architecture
The agent uses LangGraph's multi-agent architecture with the following components:
```
User Query β†’ Classifier β†’ [Structured Agent | Unstructured Agent | Recommender] β†’ Summarizer β†’ Response
↓
Tool Nodes (Dataset Analysis Tools)
```
### Agent Types
1. **Structured Agent**: Handles quantitative queries (statistics, examples, distributions)
2. **Unstructured Agent**: Handles qualitative queries (summaries, insights, patterns)
3. **Query Recommender**: Suggests follow-up questions based on context
4. **Summarizer**: Updates user profile and conversation memory
## πŸš€ Setup Instructions
### Prerequisites
- **Python Version**: 3.9 or higher
- **API Key**: OpenAI API key or Nebius API key
- **For Hugging Face Spaces**: Ensure your API key is set as a Space secret
### Installation
1. **Clone the repository**:
```bash
git clone <repository-url>
cd Agents
```
2. **Install dependencies**:
```bash
pip install -r requirements.txt
```
3. **Configure API Key**:
Create a `.env` file in the project root:
```bash
# For OpenAI (recommended)
OPENAI_API_KEY=your_openai_api_key_here
# OR for Nebius
NEBIUS_API_KEY=your_nebius_api_key_here
```
4. **Run the application**:
```bash
streamlit run app.py
```
5. **Access the app**:
Open your browser to `http://localhost:8501`
### Alternative Deployment
#### For Hugging Face Spaces:
1. **Fork or upload this repository to Hugging Face Spaces**
2. **Set your API key as a Space secret:**
- Go to your Space settings
- Navigate to "Variables and secrets"
- Add a secret named `NEBIUS_API_KEY` or `OPENAI_API_KEY`
- Enter your API key as the value
3. **The app will start automatically**
#### For other cloud deployment:
```bash
export OPENAI_API_KEY=your_api_key_here
# OR
export NEBIUS_API_KEY=your_api_key_here
```
## 🎯 Usage Guide
### Query Types
#### Structured Queries (Quantitative Analysis)
- "How many records are in each category?"
- "What are the most common customer issues?"
- "Show me 5 examples of billing problems"
- "Get distribution of intents"
#### Unstructured Queries (Qualitative Analysis)
- "Summarize the refund category"
- "What patterns do you see in payment issues?"
- "Analyze customer sentiment in billing conversations"
- "What insights can you provide about technical support?"
#### Memory & Recommendations
- "What do you remember about me?"
- "What should I query next?"
- "Advise me what to explore"
- "Recommend follow-up questions"
### Session Management
#### Creating Sessions
- **New Session**: Click "πŸ†• New Session" to start fresh
- **Auto-Generated**: Each new browser session gets a unique ID
#### Resuming Sessions
1. Copy your session ID from the sidebar (e.g., `a1b2c3d4...`)
2. Enter the full session ID in "Join Existing Session"
3. Click "πŸ”— Join Session" to resume
#### Cross-Tab Persistence
- Open multiple tabs with the same session ID
- Conversations sync across all tabs
- Memory and user profile persist
## 🧠 Memory System
### User Profile Tracking
The agent automatically tracks:
- **Interests**: Topics and categories you frequently ask about
- **Expertise Level**: Inferred from question complexity (beginner/intermediate/advanced)
- **Preferences**: Analysis style preferences (quantitative vs qualitative)
- **Query History**: Recent questions for context
### Conversation Persistence
- **Thread-based**: Each session has a unique thread ID
- **Checkpoint System**: LangGraph automatically saves state after each interaction
- **Cross-Session**: Resume conversations days or weeks later
### Memory Queries
Ask the agent what it remembers:
```
"What do you remember about me?"
"What are my interests?"
"What have I asked about before?"
```
## πŸ”§ Testing the Agent
### Basic Functionality Tests
1. **Classification Test**:
```
Query: "How many categories are there?"
Expected: Routes to Structured Agent β†’ Uses get_dataset_stats tool
```
2. **Follow-up Memory Test**:
```
Query 1: "Show me billing examples"
Query 2: "Show me more examples"
Expected: Agent remembers previous context about billing
```
3. **User Profile Test**:
```
Query 1: "I'm interested in refund patterns"
Query 2: "What do you remember about me?"
Expected: Agent mentions interest in refunds
```
4. **Recommendation Test**:
```
Query: "What should I query next?"
Expected: Personalized suggestions based on history
```
### Advanced Feature Tests
1. **Session Persistence**:
- Ask a question, reload the page
- Verify conversation history remains
- Verify user profile persists
2. **Cross-Session Memory**:
- Note your session ID
- Close browser completely
- Reopen and join the same session
- Verify full conversation and profile restoration
3. **Interactive Recommendations**:
```
User: "Advise me what to query next"
Agent: "Based on your interest in billing, you might want to analyze refund patterns."
User: "I'd rather see examples instead"
Agent: "Then I suggest showing 5 examples of refund requests."
User: "Please do so"
Expected: Agent executes the refined query
```
## πŸ“ File Structure
```
Agents/
β”œβ”€β”€ README.md # This file
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ .env # API keys (create this)
β”œβ”€β”€ app.py # LangGraph Streamlit app
β”œβ”€β”€ langgraph_agent.py # LangGraph agent implementation
β”œβ”€β”€ agent-memory.ipynb # Memory example notebook
β”œβ”€β”€ test_agent.py # Test suite
└── DEPLOYMENT_GUIDE.md # Original deployment guide
```
## πŸ› οΈ Technical Implementation
### LangGraph Components
**State Management**:
```python
class AgentState(TypedDict):
messages: List[Any]
query_type: Optional[str]
user_profile: Optional[Dict[str, Any]]
session_context: Optional[Dict[str, Any]]
```
**Tool Categories**:
- **Structured Tools**: Statistics, distributions, examples, search
- **Unstructured Tools**: Summaries, insights, pattern analysis
- **Memory Tools**: Profile updates, preference tracking
**Graph Flow**:
1. **Classifier**: Determines query type
2. **Agent Selection**: Routes to appropriate specialist
3. **Tool Execution**: Dynamic tool usage based on needs
4. **Memory Update**: Profile and context updates
5. **Response Generation**: Final answer with memory integration
### Memory Architecture
**Checkpointer**: LangGraph's `MemorySaver` for conversation persistence
**Thread Management**: Unique thread IDs for session isolation
**Profile Synthesis**: LLM-powered extraction of user characteristics
**Context Retention**: Full conversation history with temporal awareness
## πŸ” Troubleshooting
### Common Issues
1. **API Key Errors**:
- Verify `.env` file exists and has correct key
- Check environment variable is set in deployment
- Ensure API key has sufficient credits
2. **Memory Not Persisting**:
- Verify session ID remains consistent
- Check browser localStorage not being cleared
- Ensure thread_id parameter is passed correctly
3. **Dataset Loading Issues**:
- Check internet connection for Hugging Face datasets
- Verify datasets library is installed
- Try clearing Streamlit cache: `streamlit cache clear`
4. **Tool Execution Errors**:
- Verify all dependencies in requirements.txt are installed
- Check dataset is properly loaded
- Review error messages in Streamlit interface
### Debug Mode
Enable debug logging by setting:
```python
import logging
logging.basicConfig(level=logging.DEBUG)
```
## πŸŽ“ Learning Objectives
This implementation demonstrates:
1. **LangGraph Multi-Agent Systems**: Specialized agents for different query types
2. **Memory & Persistence**: Conversation continuity across sessions
3. **Tool Integration**: Dynamic tool selection and execution
4. **State Management**: Complex state updates and routing
5. **User Experience**: Session management and interactive features
## πŸš€ Future Enhancements
Potential improvements:
- **Database Persistence**: Replace MemorySaver with PostgreSQL checkpointer
- **Advanced Analytics**: More sophisticated data analysis tools
- **Export Features**: PDF/CSV report generation
- **User Authentication**: Multi-user support with profiles
- **Real-time Collaboration**: Shared sessions between users
## πŸ“„ License
This project is for educational purposes as part of a data science curriculum.
## 🀝 Contributing
This is an assignment project. For questions or issues, please contact the course instructors.
---
**Built with**: LangGraph, Streamlit, OpenAI/Nebius, Hugging Face Datasets