# AWS Load Balancer Setup Guide ## DFS Portfolio Manager - Production Deployment ### Overview This guide documents the complete process of migrating from a single EC2 instance to a load-balanced architecture with 3 instances for improved performance, reliability, and cost efficiency. --- ## 📊 Architecture Comparison ### Before (Single Instance) - **Instance**: 1x m5.xlarge (4 vCPUs, 16GB RAM) - **Cost**: ~$280/month - **Issues**: Memory crashes, single point of failure - **SSL**: Certbot/Let's Encrypt on instance ### After (Load Balanced) - **Instances**: 3x m5.large (2 vCPUs, 8GB RAM each) - **Load Balancer**: Application Load Balancer (ALB) - **Cost**: ~$220/month (20% savings) - **Benefits**: Better performance, auto-scaling, high availability - **SSL**: AWS Certificate Manager (free, auto-renewing) --- ## 🚀 Complete Setup Process ### Prerequisites - AWS CLI configured with appropriate permissions - Existing EC2 instance with working Streamlit application - Domain managed by Cloudflare - PowerShell (Windows) or Bash (Linux/Mac) --- ### Step 1: Gather Current Instance Information ```powershell # Get current instance ID $INSTANCE_ID = aws ec2 describe-instances --filters "Name=instance-state-name,Values=running" --query 'Reservations[0].Instances[0].InstanceId' --output text Write-Host "Current Instance ID: $INSTANCE_ID" # List all running instances if needed aws ec2 describe-instances --filters "Name=instance-state-name,Values=running" --query 'Reservations[*].Instances[*].[InstanceId,InstanceType,Tags[?Key==`Name`].Value|[0],PublicIpAddress]' --output table ``` --- ### Step 2: Create AMI from Current Instance ```powershell # Create AMI snapshot of optimized setup $AMI_ID = aws ec2 create-image --instance-id $INSTANCE_ID --name "DFS-Portfolio-Manager-$(Get-Date -Format 'yyyyMMdd-HHmm')" --description "DFS Portfolio Manager with memory optimizations and supervisord" --no-reboot --query 'ImageId' --output text Write-Host "Creating AMI: $AMI_ID" Write-Host "This will take 5-10 minutes..." # Wait for AMI to be ready aws ec2 wait image-available --image-ids $AMI_ID Write-Host "AMI is ready!" ``` --- ### Step 3: Extract Network Configuration ```powershell # Get VPC, subnet, and security group info $VPC_ID = aws ec2 describe-instances --instance-ids $INSTANCE_ID --query 'Reservations[0].Instances[0].VpcId' --output text $SUBNET_IDS = aws ec2 describe-subnets --filters "Name=vpc-id,Values=$VPC_ID" --query 'Subnets[*].SubnetId' --output text $SECURITY_GROUP_ID = aws ec2 describe-instances --instance-ids $INSTANCE_ID --query 'Reservations[0].Instances[0].SecurityGroups[0].GroupId' --output text Write-Host "VPC ID: $VPC_ID" Write-Host "Subnet IDs: $SUBNET_IDS" Write-Host "Security Group: $SECURITY_GROUP_ID" ``` --- ### Step 4: Create Target Group ```powershell # Create target group for health checks $TARGET_GROUP_ARN = aws elbv2 create-target-group --name "portfolio-manager-targets" --protocol HTTP --port 5000 --vpc-id $VPC_ID --health-check-path "/" --health-check-interval-seconds 30 --health-check-timeout-seconds 10 --healthy-threshold-count 2 --unhealthy-threshold-count 3 --query 'TargetGroups[0].TargetGroupArn' --output text Write-Host "Target Group ARN: $TARGET_GROUP_ARN" ``` **Target Group Configuration:** - **Port**: 5000 (Streamlit application port) - **Health Check**: Every 30 seconds at root path "/" - **Healthy Threshold**: 2 consecutive successful checks - **Unhealthy Threshold**: 3 consecutive failed checks --- ### Step 5: Create Application Load Balancer ```powershell # Create ALB $ALB_ARN = aws elbv2 create-load-balancer --name "portfolio-manager-alb" --subnets $SUBNET_IDS.Split() --security-groups $SECURITY_GROUP_ID --scheme internet-facing --type application --ip-address-type ipv4 --query 'LoadBalancers[0].LoadBalancerArn' --output text # Get ALB DNS name $ALB_DNS = aws elbv2 describe-load-balancers --load-balancer-arns $ALB_ARN --query 'LoadBalancers[0].DNSName' --output text Write-Host "ALB ARN: $ALB_ARN" Write-Host "ALB DNS: $ALB_DNS" ``` --- ### Step 6: Create HTTP Listener ```powershell # Create HTTP listener aws elbv2 create-listener --load-balancer-arn $ALB_ARN --protocol HTTP --port 80 --default-actions Type=forward,TargetGroupArn=$TARGET_GROUP_ARN Write-Host "HTTP Listener created successfully" ``` --- ### Step 7: Create Launch Template ```powershell # Create JSON content for launch template $JsonContent = @" { "ImageId": "$AMI_ID", "InstanceType": "m5.large", "SecurityGroupIds": ["$SECURITY_GROUP_ID"], "UserData": "$USER_DATA", "TagSpecifications": [ { "ResourceType": "instance", "Tags": [ { "Key": "Name", "Value": "Portfolio-Manager-Instance" } ] } ] } "@ # Save launch template data [System.IO.File]::WriteAllText("launch-template.json", $JsonContent) # Create launch template $LAUNCH_TEMPLATE_ID = aws ec2 create-launch-template --launch-template-name "portfolio-manager-template" --launch-template-data file://launch-template.json --query 'LaunchTemplate.LaunchTemplateId' --output text Write-Host "Launch Template ID: $LAUNCH_TEMPLATE_ID" ``` **Launch Template Configuration:** - **Instance Type**: m5.large (2 vCPUs, 8GB RAM) - **User Data**: Automatically restarts Streamlit service on boot - **Application Path**: `/home/ec2-user/AWS_Portfolio_Manager` --- ### Step 8: Create Auto Scaling Group ```powershell # Convert subnet IDs to comma-separated format $SUBNET_LIST = $SUBNET_IDS -replace '\s+', ',' # Create Auto Scaling Group aws autoscaling create-auto-scaling-group --auto-scaling-group-name "portfolio-manager-asg" --launch-template LaunchTemplateId=$LAUNCH_TEMPLATE_ID,Version='$Latest' --min-size 3 --max-size 5 --desired-capacity 3 --target-group-arns $TARGET_GROUP_ARN --health-check-type ELB --health-check-grace-period 300 --vpc-zone-identifier $SUBNET_LIST # Add tags to ASG aws autoscaling create-or-update-tags --tags ResourceId=portfolio-manager-asg,ResourceType=auto-scaling-group,Key=Name,Value=Portfolio-Manager-ASG,PropagateAtLaunch=true Write-Host "Auto Scaling Group created with 3 instances" ``` **Auto Scaling Group Configuration:** - **Desired Capacity**: 3 instances - **Minimum**: 3 instances - **Maximum**: 5 instances - **Health Check**: ELB-based (more reliable) - **Grace Period**: 5 minutes for instances to be ready --- ### Step 9: Enable Sticky Sessions ```powershell # Enable sticky sessions for session state management aws elbv2 modify-target-group-attributes --target-group-arn $TARGET_GROUP_ARN --attributes Key=stickiness.enabled,Value=true Key=stickiness.lb_cookie.duration_seconds,Value=86400 Write-Host "Sticky sessions enabled - users stay on same instance for 24 hours" ``` **Why Sticky Sessions are Critical:** - Streamlit stores user session state locally on each instance - File uploads and user data must stay on the same instance - Without sticky sessions: 400 errors on file uploads - Duration: 24 hours (86400 seconds) --- ### Step 10: Set Up SSL Certificate ```powershell # Request SSL certificate from AWS Certificate Manager $CERT_ARN = aws acm request-certificate --domain-name "portfolio-manager.paydirtapps.com" --validation-method DNS --query 'CertificateArn' --output text Write-Host "Certificate ARN: $CERT_ARN" # Get DNS validation records aws acm describe-certificate --certificate-arn $CERT_ARN --query 'Certificate.DomainValidationOptions[0].ResourceRecord.[Name,Value,Type]' --output table ``` **DNS Validation Process:** 1. Add the CNAME record shown in the output to Cloudflare 2. **Name**: `_[hash].portfolio-manager` (without the domain suffix) 3. **Value**: `_[hash].xlfgrmvvlj.acm-validations.aws.` 4. **Proxy Status**: DNS only (gray cloud, not orange) 5. Wait 5-15 minutes for validation ```powershell # Check certificate status aws acm describe-certificate --certificate-arn $CERT_ARN --query 'Certificate.Status' --output text # Should show "ISSUED" when ready ``` --- ### Step 11: Add HTTPS Listener ```powershell # Add HTTPS listener once certificate is issued aws elbv2 create-listener --load-balancer-arn $ALB_ARN --protocol HTTPS --port 443 --certificates CertificateArn=$CERT_ARN --default-actions Type=forward,TargetGroupArn=$TARGET_GROUP_ARN Write-Host "HTTPS listener added successfully!" # Optional: Redirect HTTP to HTTPS $HTTP_LISTENER_ARN = aws elbv2 describe-listeners --load-balancer-arn $ALB_ARN --query 'Listeners[?Port==`80`].ListenerArn' --output text aws elbv2 modify-listener --listener-arn $HTTP_LISTENER_ARN --default-actions Type=redirect,RedirectConfig='{Protocol=HTTPS,Port=443,StatusCode=HTTP_301}' Write-Host "HTTP to HTTPS redirect enabled!" ``` --- ### Step 12: Update Cloudflare DNS **Final DNS Configuration:** 1. Go to Cloudflare Dashboard → Your domain → DNS 2. Update the main record for `portfolio-manager.paydirtapps.com`: - **Type**: CNAME - **Name**: `portfolio-manager` - **Target**: `[your-alb-dns-name].us-east-2.elb.amazonaws.com` - **Proxy Status**: DNS only (gray cloud) - **TTL**: Auto --- ## 🔧 Verification Commands ### Check Instance Health ```powershell # Check Auto Scaling Group status aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names "portfolio-manager-asg" --query 'AutoScalingGroups[0].Instances[*].[InstanceId,LifecycleState,HealthStatus]' --output table # Check target health aws elbv2 describe-target-health --target-group-arn $TARGET_GROUP_ARN --query 'TargetHealthDescriptions[*].[Target.Id,TargetHealth.State,TargetHealth.Description]' --output table ``` ### Check Load Balancer Configuration ```powershell # Verify listeners aws elbv2 describe-listeners --load-balancer-arn $ALB_ARN --query 'Listeners[*].[Port,Protocol]' --output table # Check load balancer attributes aws elbv2 describe-load-balancer-attributes --load-balancer-arn $ALB_ARN --output table ``` --- ## 📈 Scaling Options ### Manual Scaling ```powershell # Scale up to 5 instances during peak hours aws autoscaling set-desired-capacity --auto-scaling-group-name "portfolio-manager-asg" --desired-capacity 5 # Scale down to 2 instances during off-hours aws autoscaling set-desired-capacity --auto-scaling-group-name "portfolio-manager-asg" --desired-capacity 2 ``` ### Automatic CPU-Based Scaling ```powershell # Set up auto-scaling based on CPU usage aws autoscaling put-scaling-policy --auto-scaling-group-name "portfolio-manager-asg" --policy-name "scale-up" --policy-type "TargetTrackingScaling" --target-tracking-configuration '{ "TargetValue": 70.0, "PredefinedMetricSpecification": { "PredefinedMetricType": "ASGAverageCPUUtilization" } }' ``` ### Scheduled Scaling ```powershell # Scale up during peak hours (6 PM EST) aws autoscaling put-scheduled-update-group-action --auto-scaling-group-name "portfolio-manager-asg" --scheduled-action-name "evening-scale-up" --recurrence "0 22 * * *" --desired-capacity 5 # Scale down during off hours (2 AM EST) aws autoscaling put-scheduled-update-group-action --auto-scaling-group-name "portfolio-manager-asg" --scheduled-action-name "morning-scale-down" --recurrence "0 6 * * *" --desired-capacity 2 ``` --- ## 🔄 Instance Type Changes ### Upgrade to Larger Instances ```powershell # Create new launch template version with m5.xlarge aws ec2 create-launch-template-version --launch-template-id $LAUNCH_TEMPLATE_ID --launch-template-data '{ "ImageId": "'$AMI_ID'", "InstanceType": "m5.xlarge", "SecurityGroupIds": ["'$SECURITY_GROUP_ID'"], "UserData": "'$USER_DATA'" }' # Update ASG to use new version aws autoscaling update-auto-scaling-group --auto-scaling-group-name "portfolio-manager-asg" --launch-template LaunchTemplateId=$LAUNCH_TEMPLATE_ID,Version='$Latest' # Force refresh to replace all instances aws autoscaling start-instance-refresh --auto-scaling-group-name "portfolio-manager-asg" ``` --- ## 💰 Cost Analysis ### Monthly Costs (US East 2) | Component | Before | After | Savings | |-----------|--------|-------|---------| | **Compute** | 1x m5.xlarge: $280 | 3x m5.large: $207 | $73 | | **Load Balancer** | None: $0 | ALB: $16 | -$16 | | **SSL Certificate** | Let's Encrypt: $0 | ACM: $0 | $0 | | **Total** | **$280/month** | **$223/month** | **$57/month (20% savings)** | ### Additional Benefits - **Performance**: 6 total vCPUs vs 4 vCPUs (50% more processing power) - **Reliability**: 3 instances vs 1 instance (high availability) - **Memory**: 24GB total vs 16GB total (50% more memory) - **Auto-scaling**: Can scale up to 5 instances during peak times --- ## 🚨 Troubleshooting ### Common Issues #### 1. File Upload 400 Errors **Symptom**: First upload fails, retry succeeds **Cause**: User routed to different instance without session state **Solution**: Enable sticky sessions ```powershell aws elbv2 modify-target-group-attributes --target-group-arn $TARGET_GROUP_ARN --attributes Key=stickiness.enabled,Value=true Key=stickiness.lb_cookie.duration_seconds,Value=86400 ``` #### 2. Certificate "Not Secure" Warning **Symptom**: Browser shows "Not Secure" despite valid certificate **Cause**: Accessing load balancer DNS instead of domain name **Solution**: Update Cloudflare DNS to point domain to load balancer #### 3. Health Check Failures **Symptom**: Instances show "unhealthy" in target group **Cause**: Streamlit not responding on port 5000 **Solution**: Check supervisord status on instances ```bash # SSH into instance sudo supervisorctl status streamlit sudo supervisorctl restart streamlit ``` #### 4. SSL Certificate Validation Stuck **Symptom**: Certificate stays "PENDING_VALIDATION" for hours **Cause**: DNS validation record not added correctly **Solution**: Verify CNAME record in Cloudflare, ensure "DNS only" (gray cloud) --- ## 🔧 Maintenance Commands ### Update Application Code ```powershell # Create new AMI with updated code $NEW_AMI_ID = aws ec2 create-image --instance-id $UPDATED_INSTANCE_ID --name "DFS-Portfolio-Manager-$(Get-Date -Format 'yyyyMMdd-HHmm')" --description "Updated application code" --no-reboot --query 'ImageId' --output text # Update launch template aws ec2 create-launch-template-version --launch-template-id $LAUNCH_TEMPLATE_ID --launch-template-data '{"ImageId": "'$NEW_AMI_ID'"}' # Refresh instances with new code aws autoscaling start-instance-refresh --auto-scaling-group-name "portfolio-manager-asg" ``` ### Monitor Performance ```powershell # Check CPU utilization aws cloudwatch get-metric-statistics --namespace AWS/EC2 --metric-name CPUUtilization --dimensions Name=AutoScalingGroupName,Value=portfolio-manager-asg --statistics Average --start-time $(Get-Date).AddHours(-1) --end-time $(Get-Date) --period 300 # Check load balancer metrics aws cloudwatch get-metric-statistics --namespace AWS/ApplicationELB --metric-name RequestCount --dimensions Name=LoadBalancer,Value=app/portfolio-manager-alb/[load-balancer-id] --statistics Sum --start-time $(Get-Date).AddHours(-1) --end-time $(Get-Date) --period 300 ``` ### Cleanup Old Resources ```powershell # Stop original single instance (once everything is working) aws ec2 stop-instances --instance-ids $ORIGINAL_INSTANCE_ID # Delete old AMIs (keep recent ones) aws ec2 describe-images --owners self --query 'Images[?Name==`DFS-Portfolio-Manager*`].[ImageId,Name,CreationDate]' --output table ``` --- ## 📝 Key Variables Reference Save these variables for future maintenance: ```powershell # Core Infrastructure $INSTANCE_ID = "i-xxxxxxxxx" # Original instance $AMI_ID = "ami-xxxxxxxxx" # Application AMI $VPC_ID = "vpc-xxxxxxxxx" # Virtual Private Cloud $SECURITY_GROUP_ID = "sg-xxxxxxxxx" # Security Group $SUBNET_IDS = "subnet-xxx subnet-yyy subnet-zzz" # Subnets # Load Balancer $ALB_ARN = "arn:aws:elasticloadbalancing:us-east-2:xxxx:loadbalancer/app/portfolio-manager-alb/xxxxxxxxxx" $ALB_DNS = "portfolio-manager-alb-xxxxxxxxxx.us-east-2.elb.amazonaws.com" $TARGET_GROUP_ARN = "arn:aws:elasticloadbalancing:us-east-2:xxxx:targetgroup/portfolio-manager-targets/xxxxxxxxxx" # Auto Scaling $LAUNCH_TEMPLATE_ID = "lt-xxxxxxxxx" # Launch Template $ASG_NAME = "portfolio-manager-asg" # Auto Scaling Group # SSL $CERT_ARN = "arn:aws:acm:us-east-2:xxxx:certificate/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" ``` --- ## 🎯 Success Metrics After completing this setup, you should have: - ✅ **3 healthy instances** running your application - ✅ **Load balancer** distributing traffic evenly - ✅ **HTTPS/SSL** working with valid certificate - ✅ **Sticky sessions** preventing upload errors - ✅ **Auto-scaling** capability (3-5 instances) - ✅ **High availability** across multiple availability zones - ✅ **Cost savings** of ~20% compared to single large instance - ✅ **Better performance** with 50% more total CPU and memory --- ## 📞 Support For issues or questions: 1. Check the troubleshooting section above 2. Verify all health checks are passing 3. Review AWS CloudWatch logs for detailed error information 4. Ensure Cloudflare DNS settings are correct --- *Document created: $(Get-Date -Format 'yyyy-MM-dd HH:mm:ss')* *Architecture: AWS Application Load Balancer + Auto Scaling Group* *Application: DFS Portfolio Manager (Streamlit)*