|
--- |
|
title: LangGraph Data Analyst Agent |
|
emoji: π€ |
|
colorFrom: blue |
|
colorTo: purple |
|
sdk: streamlit |
|
sdk_version: "1.28.0" |
|
app_file: app.py |
|
pinned: false |
|
license: mit |
|
--- |
|
|
|
# π€ LangGraph Data Analyst Agent |
|
|
|
An intelligent data analyst agent built with LangGraph that analyzes customer support conversations with advanced memory, conversation persistence, and query recommendations. |
|
|
|
## π Features |
|
|
|
### Core Functionality |
|
- **Multi-Agent Architecture**: Separate specialized agents for structured and unstructured queries |
|
- **Query Classification**: Automatic routing to appropriate agent based on query type |
|
- **Rich Tool Set**: Comprehensive tools for data analysis and insights |
|
|
|
### Advanced Memory & Persistence |
|
- **Session Management**: Persistent conversations across page reloads and browser sessions |
|
- **User Profile Tracking**: Agent learns and remembers user interests and preferences |
|
- **Conversation History**: Full context retention using LangGraph checkpointers |
|
- **Cross-Session Continuity**: Resume conversations using session IDs |
|
|
|
### Intelligent Recommendations |
|
- **Query Suggestions**: AI-powered recommendations based on conversation history |
|
- **Interactive Refinement**: Collaborative query building with the agent |
|
- **Context-Aware**: Suggestions based on user profile and previous interactions |
|
|
|
## ποΈ Architecture |
|
|
|
The agent uses LangGraph's multi-agent architecture with the following components: |
|
|
|
``` |
|
User Query β Classifier β [Structured Agent | Unstructured Agent | Recommender] β Summarizer β Response |
|
β |
|
Tool Nodes (Dataset Analysis Tools) |
|
``` |
|
|
|
### Agent Types |
|
1. **Structured Agent**: Handles quantitative queries (statistics, examples, distributions) |
|
2. **Unstructured Agent**: Handles qualitative queries (summaries, insights, patterns) |
|
3. **Query Recommender**: Suggests follow-up questions based on context |
|
4. **Summarizer**: Updates user profile and conversation memory |
|
|
|
## π Setup Instructions |
|
|
|
### Prerequisites |
|
- **Python Version**: 3.9 or higher |
|
- **API Key**: OpenAI API key or Nebius API key |
|
- **For Hugging Face Spaces**: Ensure your API key is set as a Space secret |
|
|
|
### Installation |
|
|
|
1. **Clone the repository**: |
|
```bash |
|
git clone <repository-url> |
|
cd Agents |
|
``` |
|
|
|
2. **Install dependencies**: |
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
|
|
3. **Configure API Key**: |
|
|
|
Create a `.env` file in the project root: |
|
```bash |
|
# For OpenAI (recommended) |
|
OPENAI_API_KEY=your_openai_api_key_here |
|
|
|
# OR for Nebius |
|
NEBIUS_API_KEY=your_nebius_api_key_here |
|
``` |
|
|
|
4. **Run the application**: |
|
```bash |
|
streamlit run app.py |
|
``` |
|
|
|
5. **Access the app**: |
|
Open your browser to `http://localhost:8501` |
|
|
|
### Alternative Deployment |
|
|
|
#### For Hugging Face Spaces: |
|
1. **Fork or upload this repository to Hugging Face Spaces** |
|
2. **Set your API key as a Space secret:** |
|
- Go to your Space settings |
|
- Navigate to "Variables and secrets" |
|
- Add a secret named `NEBIUS_API_KEY` or `OPENAI_API_KEY` |
|
- Enter your API key as the value |
|
3. **The app will start automatically** |
|
|
|
#### For other cloud deployment: |
|
```bash |
|
export OPENAI_API_KEY=your_api_key_here |
|
# OR |
|
export NEBIUS_API_KEY=your_api_key_here |
|
``` |
|
|
|
## π― Usage Guide |
|
|
|
### Query Types |
|
|
|
#### Structured Queries (Quantitative Analysis) |
|
- "How many records are in each category?" |
|
- "What are the most common customer issues?" |
|
- "Show me 5 examples of billing problems" |
|
- "Get distribution of intents" |
|
|
|
#### Unstructured Queries (Qualitative Analysis) |
|
- "Summarize the refund category" |
|
- "What patterns do you see in payment issues?" |
|
- "Analyze customer sentiment in billing conversations" |
|
- "What insights can you provide about technical support?" |
|
|
|
#### Memory & Recommendations |
|
- "What do you remember about me?" |
|
- "What should I query next?" |
|
- "Advise me what to explore" |
|
- "Recommend follow-up questions" |
|
|
|
### Session Management |
|
|
|
#### Creating Sessions |
|
- **New Session**: Click "π New Session" to start fresh |
|
- **Auto-Generated**: Each new browser session gets a unique ID |
|
|
|
#### Resuming Sessions |
|
1. Copy your session ID from the sidebar (e.g., `a1b2c3d4...`) |
|
2. Enter the full session ID in "Join Existing Session" |
|
3. Click "π Join Session" to resume |
|
|
|
#### Cross-Tab Persistence |
|
- Open multiple tabs with the same session ID |
|
- Conversations sync across all tabs |
|
- Memory and user profile persist |
|
|
|
## π§ Memory System |
|
|
|
### User Profile Tracking |
|
The agent automatically tracks: |
|
- **Interests**: Topics and categories you frequently ask about |
|
- **Expertise Level**: Inferred from question complexity (beginner/intermediate/advanced) |
|
- **Preferences**: Analysis style preferences (quantitative vs qualitative) |
|
- **Query History**: Recent questions for context |
|
|
|
### Conversation Persistence |
|
- **Thread-based**: Each session has a unique thread ID |
|
- **Checkpoint System**: LangGraph automatically saves state after each interaction |
|
- **Cross-Session**: Resume conversations days or weeks later |
|
|
|
### Memory Queries |
|
Ask the agent what it remembers: |
|
``` |
|
"What do you remember about me?" |
|
"What are my interests?" |
|
"What have I asked about before?" |
|
``` |
|
|
|
## π§ Testing the Agent |
|
|
|
### Basic Functionality Tests |
|
|
|
1. **Classification Test**: |
|
``` |
|
Query: "How many categories are there?" |
|
Expected: Routes to Structured Agent β Uses get_dataset_stats tool |
|
``` |
|
|
|
2. **Follow-up Memory Test**: |
|
``` |
|
Query 1: "Show me billing examples" |
|
Query 2: "Show me more examples" |
|
Expected: Agent remembers previous context about billing |
|
``` |
|
|
|
3. **User Profile Test**: |
|
``` |
|
Query 1: "I'm interested in refund patterns" |
|
Query 2: "What do you remember about me?" |
|
Expected: Agent mentions interest in refunds |
|
``` |
|
|
|
4. **Recommendation Test**: |
|
``` |
|
Query: "What should I query next?" |
|
Expected: Personalized suggestions based on history |
|
``` |
|
|
|
### Advanced Feature Tests |
|
|
|
1. **Session Persistence**: |
|
- Ask a question, reload the page |
|
- Verify conversation history remains |
|
- Verify user profile persists |
|
|
|
2. **Cross-Session Memory**: |
|
- Note your session ID |
|
- Close browser completely |
|
- Reopen and join the same session |
|
- Verify full conversation and profile restoration |
|
|
|
3. **Interactive Recommendations**: |
|
``` |
|
User: "Advise me what to query next" |
|
Agent: "Based on your interest in billing, you might want to analyze refund patterns." |
|
User: "I'd rather see examples instead" |
|
Agent: "Then I suggest showing 5 examples of refund requests." |
|
User: "Please do so" |
|
Expected: Agent executes the refined query |
|
``` |
|
|
|
## π File Structure |
|
|
|
``` |
|
Agents/ |
|
βββ README.md # This file |
|
βββ requirements.txt # Python dependencies |
|
βββ .env # API keys (create this) |
|
βββ app.py # LangGraph Streamlit app |
|
βββ langgraph_agent.py # LangGraph agent implementation |
|
βββ agent-memory.ipynb # Memory example notebook |
|
βββ test_agent.py # Test suite |
|
βββ DEPLOYMENT_GUIDE.md # Original deployment guide |
|
``` |
|
|
|
## π οΈ Technical Implementation |
|
|
|
### LangGraph Components |
|
|
|
**State Management**: |
|
```python |
|
class AgentState(TypedDict): |
|
messages: List[Any] |
|
query_type: Optional[str] |
|
user_profile: Optional[Dict[str, Any]] |
|
session_context: Optional[Dict[str, Any]] |
|
``` |
|
|
|
**Tool Categories**: |
|
- **Structured Tools**: Statistics, distributions, examples, search |
|
- **Unstructured Tools**: Summaries, insights, pattern analysis |
|
- **Memory Tools**: Profile updates, preference tracking |
|
|
|
**Graph Flow**: |
|
1. **Classifier**: Determines query type |
|
2. **Agent Selection**: Routes to appropriate specialist |
|
3. **Tool Execution**: Dynamic tool usage based on needs |
|
4. **Memory Update**: Profile and context updates |
|
5. **Response Generation**: Final answer with memory integration |
|
|
|
### Memory Architecture |
|
|
|
**Checkpointer**: LangGraph's `MemorySaver` for conversation persistence |
|
**Thread Management**: Unique thread IDs for session isolation |
|
**Profile Synthesis**: LLM-powered extraction of user characteristics |
|
**Context Retention**: Full conversation history with temporal awareness |
|
|
|
## π Troubleshooting |
|
|
|
### Common Issues |
|
|
|
1. **API Key Errors**: |
|
- Verify `.env` file exists and has correct key |
|
- Check environment variable is set in deployment |
|
- Ensure API key has sufficient credits |
|
|
|
2. **Memory Not Persisting**: |
|
- Verify session ID remains consistent |
|
- Check browser localStorage not being cleared |
|
- Ensure thread_id parameter is passed correctly |
|
|
|
3. **Dataset Loading Issues**: |
|
- Check internet connection for Hugging Face datasets |
|
- Verify datasets library is installed |
|
- Try clearing Streamlit cache: `streamlit cache clear` |
|
|
|
4. **Tool Execution Errors**: |
|
- Verify all dependencies in requirements.txt are installed |
|
- Check dataset is properly loaded |
|
- Review error messages in Streamlit interface |
|
|
|
### Debug Mode |
|
|
|
Enable debug logging by setting: |
|
```python |
|
import logging |
|
logging.basicConfig(level=logging.DEBUG) |
|
``` |
|
|
|
## π Learning Objectives |
|
|
|
This implementation demonstrates: |
|
|
|
1. **LangGraph Multi-Agent Systems**: Specialized agents for different query types |
|
2. **Memory & Persistence**: Conversation continuity across sessions |
|
3. **Tool Integration**: Dynamic tool selection and execution |
|
4. **State Management**: Complex state updates and routing |
|
5. **User Experience**: Session management and interactive features |
|
|
|
## π Future Enhancements |
|
|
|
Potential improvements: |
|
- **Database Persistence**: Replace MemorySaver with PostgreSQL checkpointer |
|
- **Advanced Analytics**: More sophisticated data analysis tools |
|
- **Export Features**: PDF/CSV report generation |
|
- **User Authentication**: Multi-user support with profiles |
|
- **Real-time Collaboration**: Shared sessions between users |
|
|
|
## π License |
|
|
|
This project is for educational purposes as part of a data science curriculum. |
|
|
|
## π€ Contributing |
|
|
|
This is an assignment project. For questions or issues, please contact the course instructors. |
|
|
|
--- |
|
|
|
**Built with**: LangGraph, Streamlit, OpenAI/Nebius, Hugging Face Datasets |