DFS_Portfolio_Manager / AWS_Load_Balancer_Setup_Guide.md
James McCool
Added QB force option for micro
ef06cec

A newer version of the Streamlit SDK is available: 1.50.0

Upgrade

AWS Load Balancer Setup Guide

DFS Portfolio Manager - Production Deployment

Overview

This guide documents the complete process of migrating from a single EC2 instance to a load-balanced architecture with 3 instances for improved performance, reliability, and cost efficiency.


πŸ“Š Architecture Comparison

Before (Single Instance)

  • Instance: 1x m5.xlarge (4 vCPUs, 16GB RAM)
  • Cost: ~$280/month
  • Issues: Memory crashes, single point of failure
  • SSL: Certbot/Let's Encrypt on instance

After (Load Balanced)

  • Instances: 3x m5.large (2 vCPUs, 8GB RAM each)
  • Load Balancer: Application Load Balancer (ALB)
  • Cost: ~$220/month (20% savings)
  • Benefits: Better performance, auto-scaling, high availability
  • SSL: AWS Certificate Manager (free, auto-renewing)

πŸš€ Complete Setup Process

Prerequisites

  • AWS CLI configured with appropriate permissions
  • Existing EC2 instance with working Streamlit application
  • Domain managed by Cloudflare
  • PowerShell (Windows) or Bash (Linux/Mac)

Step 1: Gather Current Instance Information

# Get current instance ID
$INSTANCE_ID = aws ec2 describe-instances --filters "Name=instance-state-name,Values=running" --query 'Reservations[0].Instances[0].InstanceId' --output text

Write-Host "Current Instance ID: $INSTANCE_ID"

# List all running instances if needed
aws ec2 describe-instances --filters "Name=instance-state-name,Values=running" --query 'Reservations[*].Instances[*].[InstanceId,InstanceType,Tags[?Key==`Name`].Value|[0],PublicIpAddress]' --output table

Step 2: Create AMI from Current Instance

# Create AMI snapshot of optimized setup
$AMI_ID = aws ec2 create-image --instance-id $INSTANCE_ID --name "DFS-Portfolio-Manager-$(Get-Date -Format 'yyyyMMdd-HHmm')" --description "DFS Portfolio Manager with memory optimizations and supervisord" --no-reboot --query 'ImageId' --output text

Write-Host "Creating AMI: $AMI_ID"
Write-Host "This will take 5-10 minutes..."

# Wait for AMI to be ready
aws ec2 wait image-available --image-ids $AMI_ID
Write-Host "AMI is ready!"

Step 3: Extract Network Configuration

# Get VPC, subnet, and security group info
$VPC_ID = aws ec2 describe-instances --instance-ids $INSTANCE_ID --query 'Reservations[0].Instances[0].VpcId' --output text

$SUBNET_IDS = aws ec2 describe-subnets --filters "Name=vpc-id,Values=$VPC_ID" --query 'Subnets[*].SubnetId' --output text

$SECURITY_GROUP_ID = aws ec2 describe-instances --instance-ids $INSTANCE_ID --query 'Reservations[0].Instances[0].SecurityGroups[0].GroupId' --output text

Write-Host "VPC ID: $VPC_ID"
Write-Host "Subnet IDs: $SUBNET_IDS"
Write-Host "Security Group: $SECURITY_GROUP_ID"

Step 4: Create Target Group

# Create target group for health checks
$TARGET_GROUP_ARN = aws elbv2 create-target-group --name "portfolio-manager-targets" --protocol HTTP --port 5000 --vpc-id $VPC_ID --health-check-path "/" --health-check-interval-seconds 30 --health-check-timeout-seconds 10 --healthy-threshold-count 2 --unhealthy-threshold-count 3 --query 'TargetGroups[0].TargetGroupArn' --output text

Write-Host "Target Group ARN: $TARGET_GROUP_ARN"

Target Group Configuration:

  • Port: 5000 (Streamlit application port)
  • Health Check: Every 30 seconds at root path "/"
  • Healthy Threshold: 2 consecutive successful checks
  • Unhealthy Threshold: 3 consecutive failed checks

Step 5: Create Application Load Balancer

# Create ALB
$ALB_ARN = aws elbv2 create-load-balancer --name "portfolio-manager-alb" --subnets $SUBNET_IDS.Split() --security-groups $SECURITY_GROUP_ID --scheme internet-facing --type application --ip-address-type ipv4 --query 'LoadBalancers[0].LoadBalancerArn' --output text

# Get ALB DNS name
$ALB_DNS = aws elbv2 describe-load-balancers --load-balancer-arns $ALB_ARN --query 'LoadBalancers[0].DNSName' --output text

Write-Host "ALB ARN: $ALB_ARN"
Write-Host "ALB DNS: $ALB_DNS"

Step 6: Create HTTP Listener

# Create HTTP listener
aws elbv2 create-listener --load-balancer-arn $ALB_ARN --protocol HTTP --port 80 --default-actions Type=forward,TargetGroupArn=$TARGET_GROUP_ARN

Write-Host "HTTP Listener created successfully"

Step 7: Create Launch Template

# Create JSON content for launch template
$JsonContent = @"
{
    "ImageId": "$AMI_ID",
    "InstanceType": "m5.large",
    "SecurityGroupIds": ["$SECURITY_GROUP_ID"],
    "UserData": "$USER_DATA",
    "TagSpecifications": [
        {
            "ResourceType": "instance",
            "Tags": [
                {
                    "Key": "Name",
                    "Value": "Portfolio-Manager-Instance"
                }
            ]
        }
    ]
}
"@

# Save launch template data
[System.IO.File]::WriteAllText("launch-template.json", $JsonContent)

# Create launch template
$LAUNCH_TEMPLATE_ID = aws ec2 create-launch-template --launch-template-name "portfolio-manager-template" --launch-template-data file://launch-template.json --query 'LaunchTemplate.LaunchTemplateId' --output text

Write-Host "Launch Template ID: $LAUNCH_TEMPLATE_ID"

Launch Template Configuration:

  • Instance Type: m5.large (2 vCPUs, 8GB RAM)
  • User Data: Automatically restarts Streamlit service on boot
  • Application Path: /home/ec2-user/AWS_Portfolio_Manager

Step 8: Create Auto Scaling Group

# Convert subnet IDs to comma-separated format
$SUBNET_LIST = $SUBNET_IDS -replace '\s+', ','

# Create Auto Scaling Group
aws autoscaling create-auto-scaling-group --auto-scaling-group-name "portfolio-manager-asg" --launch-template LaunchTemplateId=$LAUNCH_TEMPLATE_ID,Version='$Latest' --min-size 3 --max-size 5 --desired-capacity 3 --target-group-arns $TARGET_GROUP_ARN --health-check-type ELB --health-check-grace-period 300 --vpc-zone-identifier $SUBNET_LIST

# Add tags to ASG
aws autoscaling create-or-update-tags --tags ResourceId=portfolio-manager-asg,ResourceType=auto-scaling-group,Key=Name,Value=Portfolio-Manager-ASG,PropagateAtLaunch=true

Write-Host "Auto Scaling Group created with 3 instances"

Auto Scaling Group Configuration:

  • Desired Capacity: 3 instances
  • Minimum: 3 instances
  • Maximum: 5 instances
  • Health Check: ELB-based (more reliable)
  • Grace Period: 5 minutes for instances to be ready

Step 9: Enable Sticky Sessions

# Enable sticky sessions for session state management
aws elbv2 modify-target-group-attributes --target-group-arn $TARGET_GROUP_ARN --attributes Key=stickiness.enabled,Value=true Key=stickiness.lb_cookie.duration_seconds,Value=86400

Write-Host "Sticky sessions enabled - users stay on same instance for 24 hours"

Why Sticky Sessions are Critical:

  • Streamlit stores user session state locally on each instance
  • File uploads and user data must stay on the same instance
  • Without sticky sessions: 400 errors on file uploads
  • Duration: 24 hours (86400 seconds)

Step 10: Set Up SSL Certificate

# Request SSL certificate from AWS Certificate Manager
$CERT_ARN = aws acm request-certificate --domain-name "portfolio-manager.paydirtapps.com" --validation-method DNS --query 'CertificateArn' --output text

Write-Host "Certificate ARN: $CERT_ARN"

# Get DNS validation records
aws acm describe-certificate --certificate-arn $CERT_ARN --query 'Certificate.DomainValidationOptions[0].ResourceRecord.[Name,Value,Type]' --output table

DNS Validation Process:

  1. Add the CNAME record shown in the output to Cloudflare
  2. Name: _[hash].portfolio-manager (without the domain suffix)
  3. Value: _[hash].xlfgrmvvlj.acm-validations.aws.
  4. Proxy Status: DNS only (gray cloud, not orange)
  5. Wait 5-15 minutes for validation
# Check certificate status
aws acm describe-certificate --certificate-arn $CERT_ARN --query 'Certificate.Status' --output text
# Should show "ISSUED" when ready

Step 11: Add HTTPS Listener

# Add HTTPS listener once certificate is issued
aws elbv2 create-listener --load-balancer-arn $ALB_ARN --protocol HTTPS --port 443 --certificates CertificateArn=$CERT_ARN --default-actions Type=forward,TargetGroupArn=$TARGET_GROUP_ARN

Write-Host "HTTPS listener added successfully!"

# Optional: Redirect HTTP to HTTPS
$HTTP_LISTENER_ARN = aws elbv2 describe-listeners --load-balancer-arn $ALB_ARN --query 'Listeners[?Port==`80`].ListenerArn' --output text
aws elbv2 modify-listener --listener-arn $HTTP_LISTENER_ARN --default-actions Type=redirect,RedirectConfig='{Protocol=HTTPS,Port=443,StatusCode=HTTP_301}'

Write-Host "HTTP to HTTPS redirect enabled!"

Step 12: Update Cloudflare DNS

Final DNS Configuration:

  1. Go to Cloudflare Dashboard β†’ Your domain β†’ DNS
  2. Update the main record for portfolio-manager.paydirtapps.com:
    • Type: CNAME
    • Name: portfolio-manager
    • Target: [your-alb-dns-name].us-east-2.elb.amazonaws.com
    • Proxy Status: DNS only (gray cloud)
    • TTL: Auto

πŸ”§ Verification Commands

Check Instance Health

# Check Auto Scaling Group status
aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names "portfolio-manager-asg" --query 'AutoScalingGroups[0].Instances[*].[InstanceId,LifecycleState,HealthStatus]' --output table

# Check target health
aws elbv2 describe-target-health --target-group-arn $TARGET_GROUP_ARN --query 'TargetHealthDescriptions[*].[Target.Id,TargetHealth.State,TargetHealth.Description]' --output table

Check Load Balancer Configuration

# Verify listeners
aws elbv2 describe-listeners --load-balancer-arn $ALB_ARN --query 'Listeners[*].[Port,Protocol]' --output table

# Check load balancer attributes
aws elbv2 describe-load-balancer-attributes --load-balancer-arn $ALB_ARN --output table

πŸ“ˆ Scaling Options

Manual Scaling

# Scale up to 5 instances during peak hours
aws autoscaling set-desired-capacity --auto-scaling-group-name "portfolio-manager-asg" --desired-capacity 5

# Scale down to 2 instances during off-hours
aws autoscaling set-desired-capacity --auto-scaling-group-name "portfolio-manager-asg" --desired-capacity 2

Automatic CPU-Based Scaling

# Set up auto-scaling based on CPU usage
aws autoscaling put-scaling-policy --auto-scaling-group-name "portfolio-manager-asg" --policy-name "scale-up" --policy-type "TargetTrackingScaling" --target-tracking-configuration '{
    "TargetValue": 70.0,
    "PredefinedMetricSpecification": {
        "PredefinedMetricType": "ASGAverageCPUUtilization"
    }
}'

Scheduled Scaling

# Scale up during peak hours (6 PM EST)
aws autoscaling put-scheduled-update-group-action --auto-scaling-group-name "portfolio-manager-asg" --scheduled-action-name "evening-scale-up" --recurrence "0 22 * * *" --desired-capacity 5

# Scale down during off hours (2 AM EST)
aws autoscaling put-scheduled-update-group-action --auto-scaling-group-name "portfolio-manager-asg" --scheduled-action-name "morning-scale-down" --recurrence "0 6 * * *" --desired-capacity 2

πŸ”„ Instance Type Changes

Upgrade to Larger Instances

# Create new launch template version with m5.xlarge
aws ec2 create-launch-template-version --launch-template-id $LAUNCH_TEMPLATE_ID --launch-template-data '{
    "ImageId": "'$AMI_ID'",
    "InstanceType": "m5.xlarge",
    "SecurityGroupIds": ["'$SECURITY_GROUP_ID'"],
    "UserData": "'$USER_DATA'"
}'

# Update ASG to use new version
aws autoscaling update-auto-scaling-group --auto-scaling-group-name "portfolio-manager-asg" --launch-template LaunchTemplateId=$LAUNCH_TEMPLATE_ID,Version='$Latest'

# Force refresh to replace all instances
aws autoscaling start-instance-refresh --auto-scaling-group-name "portfolio-manager-asg"

πŸ’° Cost Analysis

Monthly Costs (US East 2)

Component Before After Savings
Compute 1x m5.xlarge: $280 3x m5.large: $207 $73
Load Balancer None: $0 ALB: $16 -$16
SSL Certificate Let's Encrypt: $0 ACM: $0 $0
Total $280/month $223/month $57/month (20% savings)

Additional Benefits

  • Performance: 6 total vCPUs vs 4 vCPUs (50% more processing power)
  • Reliability: 3 instances vs 1 instance (high availability)
  • Memory: 24GB total vs 16GB total (50% more memory)
  • Auto-scaling: Can scale up to 5 instances during peak times

🚨 Troubleshooting

Common Issues

1. File Upload 400 Errors

Symptom: First upload fails, retry succeeds Cause: User routed to different instance without session state Solution: Enable sticky sessions

aws elbv2 modify-target-group-attributes --target-group-arn $TARGET_GROUP_ARN --attributes Key=stickiness.enabled,Value=true Key=stickiness.lb_cookie.duration_seconds,Value=86400

2. Certificate "Not Secure" Warning

Symptom: Browser shows "Not Secure" despite valid certificate Cause: Accessing load balancer DNS instead of domain name Solution: Update Cloudflare DNS to point domain to load balancer

3. Health Check Failures

Symptom: Instances show "unhealthy" in target group Cause: Streamlit not responding on port 5000 Solution: Check supervisord status on instances

# SSH into instance
sudo supervisorctl status streamlit
sudo supervisorctl restart streamlit

4. SSL Certificate Validation Stuck

Symptom: Certificate stays "PENDING_VALIDATION" for hours Cause: DNS validation record not added correctly Solution: Verify CNAME record in Cloudflare, ensure "DNS only" (gray cloud)


πŸ”§ Maintenance Commands

Update Application Code

# Create new AMI with updated code
$NEW_AMI_ID = aws ec2 create-image --instance-id $UPDATED_INSTANCE_ID --name "DFS-Portfolio-Manager-$(Get-Date -Format 'yyyyMMdd-HHmm')" --description "Updated application code" --no-reboot --query 'ImageId' --output text

# Update launch template
aws ec2 create-launch-template-version --launch-template-id $LAUNCH_TEMPLATE_ID --launch-template-data '{"ImageId": "'$NEW_AMI_ID'"}'

# Refresh instances with new code
aws autoscaling start-instance-refresh --auto-scaling-group-name "portfolio-manager-asg"

Monitor Performance

# Check CPU utilization
aws cloudwatch get-metric-statistics --namespace AWS/EC2 --metric-name CPUUtilization --dimensions Name=AutoScalingGroupName,Value=portfolio-manager-asg --statistics Average --start-time $(Get-Date).AddHours(-1) --end-time $(Get-Date) --period 300

# Check load balancer metrics
aws cloudwatch get-metric-statistics --namespace AWS/ApplicationELB --metric-name RequestCount --dimensions Name=LoadBalancer,Value=app/portfolio-manager-alb/[load-balancer-id] --statistics Sum --start-time $(Get-Date).AddHours(-1) --end-time $(Get-Date) --period 300

Cleanup Old Resources

# Stop original single instance (once everything is working)
aws ec2 stop-instances --instance-ids $ORIGINAL_INSTANCE_ID

# Delete old AMIs (keep recent ones)
aws ec2 describe-images --owners self --query 'Images[?Name==`DFS-Portfolio-Manager*`].[ImageId,Name,CreationDate]' --output table

πŸ“ Key Variables Reference

Save these variables for future maintenance:

# Core Infrastructure
$INSTANCE_ID = "i-xxxxxxxxx"                    # Original instance
$AMI_ID = "ami-xxxxxxxxx"                       # Application AMI
$VPC_ID = "vpc-xxxxxxxxx"                       # Virtual Private Cloud
$SECURITY_GROUP_ID = "sg-xxxxxxxxx"             # Security Group
$SUBNET_IDS = "subnet-xxx subnet-yyy subnet-zzz" # Subnets

# Load Balancer
$ALB_ARN = "arn:aws:elasticloadbalancing:us-east-2:xxxx:loadbalancer/app/portfolio-manager-alb/xxxxxxxxxx"
$ALB_DNS = "portfolio-manager-alb-xxxxxxxxxx.us-east-2.elb.amazonaws.com"
$TARGET_GROUP_ARN = "arn:aws:elasticloadbalancing:us-east-2:xxxx:targetgroup/portfolio-manager-targets/xxxxxxxxxx"

# Auto Scaling
$LAUNCH_TEMPLATE_ID = "lt-xxxxxxxxx"             # Launch Template
$ASG_NAME = "portfolio-manager-asg"              # Auto Scaling Group

# SSL
$CERT_ARN = "arn:aws:acm:us-east-2:xxxx:certificate/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"

🎯 Success Metrics

After completing this setup, you should have:

  • βœ… 3 healthy instances running your application
  • βœ… Load balancer distributing traffic evenly
  • βœ… HTTPS/SSL working with valid certificate
  • βœ… Sticky sessions preventing upload errors
  • βœ… Auto-scaling capability (3-5 instances)
  • βœ… High availability across multiple availability zones
  • βœ… Cost savings of ~20% compared to single large instance
  • βœ… Better performance with 50% more total CPU and memory

πŸ“ž Support

For issues or questions:

  1. Check the troubleshooting section above
  2. Verify all health checks are passing
  3. Review AWS CloudWatch logs for detailed error information
  4. Ensure Cloudflare DNS settings are correct

Document created: $(Get-Date -Format 'yyyy-MM-dd HH:mm:ss') Architecture: AWS Application Load Balancer + Auto Scaling Group Application: DFS Portfolio Manager (Streamlit)