Spaces:

SaritMeshesha
/

langraph-llm-data-analyst-agent

Sleeping

App Files Files Community

langraph-llm-data-analyst-agent / README.md

SaritMeshesha

Upload 2 files

3e090a6 verified 2 months ago

preview code

raw

history blame contribute delete

9.95 kB

	---
	title: LangGraph Data Analyst Agent
	emoji: 🤖
	colorFrom: blue
	colorTo: purple
	sdk: streamlit
	sdk_version: "1.28.0"
	app_file: app.py
	pinned: false
	license: mit
	---

	# 🤖 LangGraph Data Analyst Agent

	An intelligent data analyst agent built with LangGraph that analyzes customer support conversations with advanced memory, conversation persistence, and query recommendations.

	## 🌟 Features

	### Core Functionality
	- Multi-Agent Architecture: Separate specialized agents for structured and unstructured queries
	- Query Classification: Automatic routing to appropriate agent based on query type
	- Rich Tool Set: Comprehensive tools for data analysis and insights

	### Advanced Memory & Persistence
	- Session Management: Persistent conversations across page reloads and browser sessions
	- User Profile Tracking: Agent learns and remembers user interests and preferences
	- Conversation History: Full context retention using LangGraph checkpointers
	- Cross-Session Continuity: Resume conversations using session IDs

	### Intelligent Recommendations
	- Query Suggestions: AI-powered recommendations based on conversation history
	- Interactive Refinement: Collaborative query building with the agent
	- Context-Aware: Suggestions based on user profile and previous interactions

	## 🏗️ Architecture

	The agent uses LangGraph's multi-agent architecture with the following components:

	```
	User Query → Classifier → [Structured Agent \| Unstructured Agent \| Recommender] → Summarizer → Response
	↓
	Tool Nodes (Dataset Analysis Tools)
	```

	### Agent Types
	1. Structured Agent: Handles quantitative queries (statistics, examples, distributions)
	2. Unstructured Agent: Handles qualitative queries (summaries, insights, patterns)
	3. Query Recommender: Suggests follow-up questions based on context
	4. Summarizer: Updates user profile and conversation memory

	## 🚀 Setup Instructions

	### Prerequisites
	- Python Version: 3.9 or higher
	- API Key: OpenAI API key or Nebius API key
	- For Hugging Face Spaces: Ensure your API key is set as a Space secret

	### Installation

	1. Clone the repository:
	```bash
	git clone <repository-url>
	cd Agents
	```

	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	3. Configure API Key:

	Create a `.env` file in the project root:
	```bash
	# For OpenAI (recommended)
	OPENAI_API_KEY=your_openai_api_key_here

	# OR for Nebius
	NEBIUS_API_KEY=your_nebius_api_key_here
	```

	4. Run the application:
	```bash
	streamlit run app.py
	```

	5. Access the app:
	Open your browser to `http://localhost:8501`

	### Alternative Deployment

	#### For Hugging Face Spaces:
	1. Fork or upload this repository to Hugging Face Spaces
	2. Set your API key as a Space secret:
	- Go to your Space settings
	- Navigate to "Variables and secrets"
	- Add a secret named `NEBIUS_API_KEY` or `OPENAI_API_KEY`
	- Enter your API key as the value
	3. The app will start automatically

	#### For other cloud deployment:
	```bash
	export OPENAI_API_KEY=your_api_key_here
	# OR
	export NEBIUS_API_KEY=your_api_key_here
	```

	## 🎯 Usage Guide

	### Query Types

	#### Structured Queries (Quantitative Analysis)
	- "How many records are in each category?"
	- "What are the most common customer issues?"
	- "Show me 5 examples of billing problems"
	- "Get distribution of intents"

	#### Unstructured Queries (Qualitative Analysis)
	- "Summarize the refund category"
	- "What patterns do you see in payment issues?"
	- "Analyze customer sentiment in billing conversations"
	- "What insights can you provide about technical support?"

	#### Memory & Recommendations
	- "What do you remember about me?"
	- "What should I query next?"
	- "Advise me what to explore"
	- "Recommend follow-up questions"

	### Session Management

	#### Creating Sessions
	- New Session: Click "🆕 New Session" to start fresh
	- Auto-Generated: Each new browser session gets a unique ID

	#### Resuming Sessions
	1. Copy your session ID from the sidebar (e.g., `a1b2c3d4...`)
	2. Enter the full session ID in "Join Existing Session"
	3. Click "🔗 Join Session" to resume

	#### Cross-Tab Persistence
	- Open multiple tabs with the same session ID
	- Conversations sync across all tabs
	- Memory and user profile persist

	## 🧠 Memory System

	### User Profile Tracking
	The agent automatically tracks:
	- Interests: Topics and categories you frequently ask about
	- Expertise Level: Inferred from question complexity (beginner/intermediate/advanced)
	- Preferences: Analysis style preferences (quantitative vs qualitative)
	- Query History: Recent questions for context

	### Conversation Persistence
	- Thread-based: Each session has a unique thread ID
	- Checkpoint System: LangGraph automatically saves state after each interaction
	- Cross-Session: Resume conversations days or weeks later

	### Memory Queries
	Ask the agent what it remembers:
	```
	"What do you remember about me?"
	"What are my interests?"
	"What have I asked about before?"
	```

	## 🔧 Testing the Agent

	### Basic Functionality Tests

	1. Classification Test:
	```
	Query: "How many categories are there?"
	Expected: Routes to Structured Agent → Uses get_dataset_stats tool
	```

	2. Follow-up Memory Test:
	```
	Query 1: "Show me billing examples"
	Query 2: "Show me more examples"
	Expected: Agent remembers previous context about billing
	```

	3. User Profile Test:
	```
	Query 1: "I'm interested in refund patterns"
	Query 2: "What do you remember about me?"
	Expected: Agent mentions interest in refunds
	```

	4. Recommendation Test:
	```
	Query: "What should I query next?"
	Expected: Personalized suggestions based on history
	```

	### Advanced Feature Tests

	1. Session Persistence:
	- Ask a question, reload the page
	- Verify conversation history remains
	- Verify user profile persists

	2. Cross-Session Memory:
	- Note your session ID
	- Close browser completely
	- Reopen and join the same session
	- Verify full conversation and profile restoration

	3. Interactive Recommendations:
	```
	User: "Advise me what to query next"
	Agent: "Based on your interest in billing, you might want to analyze refund patterns."
	User: "I'd rather see examples instead"
	Agent: "Then I suggest showing 5 examples of refund requests."
	User: "Please do so"
	Expected: Agent executes the refined query
	```

	## 📁 File Structure

	```
	Agents/
	├── README.md # This file
	├── requirements.txt # Python dependencies
	├── .env # API keys (create this)
	├── app.py # LangGraph Streamlit app
	├── langgraph_agent.py # LangGraph agent implementation
	├── agent-memory.ipynb # Memory example notebook
	├── test_agent.py # Test suite
	└── DEPLOYMENT_GUIDE.md # Original deployment guide
	```

	## 🛠️ Technical Implementation

	### LangGraph Components

	State Management:
	```python
	class AgentState(TypedDict):
	messages: List[Any]
	query_type: Optional[str]
	user_profile: Optional[Dict[str, Any]]
	session_context: Optional[Dict[str, Any]]
	```

	Tool Categories:
	- Structured Tools: Statistics, distributions, examples, search
	- Unstructured Tools: Summaries, insights, pattern analysis
	- Memory Tools: Profile updates, preference tracking

	Graph Flow:
	1. Classifier: Determines query type
	2. Agent Selection: Routes to appropriate specialist
	3. Tool Execution: Dynamic tool usage based on needs
	4. Memory Update: Profile and context updates
	5. Response Generation: Final answer with memory integration

	### Memory Architecture

	Checkpointer: LangGraph's `MemorySaver` for conversation persistence
	Thread Management: Unique thread IDs for session isolation
	Profile Synthesis: LLM-powered extraction of user characteristics
	Context Retention: Full conversation history with temporal awareness

	## 🔍 Troubleshooting

	### Common Issues

	1. API Key Errors:
	- Verify `.env` file exists and has correct key
	- Check environment variable is set in deployment
	- Ensure API key has sufficient credits

	2. Memory Not Persisting:
	- Verify session ID remains consistent
	- Check browser localStorage not being cleared
	- Ensure thread_id parameter is passed correctly

	3. Dataset Loading Issues:
	- Check internet connection for Hugging Face datasets
	- Verify datasets library is installed
	- Try clearing Streamlit cache: `streamlit cache clear`

	4. Tool Execution Errors:
	- Verify all dependencies in requirements.txt are installed
	- Check dataset is properly loaded
	- Review error messages in Streamlit interface

	### Debug Mode

	Enable debug logging by setting:
	```python
	import logging
	logging.basicConfig(level=logging.DEBUG)
	```

	## 🎓 Learning Objectives

	This implementation demonstrates:

	1. LangGraph Multi-Agent Systems: Specialized agents for different query types
	2. Memory & Persistence: Conversation continuity across sessions
	3. Tool Integration: Dynamic tool selection and execution
	4. State Management: Complex state updates and routing
	5. User Experience: Session management and interactive features

	## 🚀 Future Enhancements

	Potential improvements:
	- Database Persistence: Replace MemorySaver with PostgreSQL checkpointer
	- Advanced Analytics: More sophisticated data analysis tools
	- Export Features: PDF/CSV report generation
	- User Authentication: Multi-user support with profiles
	- Real-time Collaboration: Shared sessions between users

	## 📄 License

	This project is for educational purposes as part of a data science curriculum.

	## 🤝 Contributing

	This is an assignment project. For questions or issues, please contact the course instructors.

	---

	Built with: LangGraph, Streamlit, OpenAI/Nebius, Hugging Face Datasets