Spaces:
Sleeping
Sleeping
Lisa Dunlap
commited on
Commit
·
f850bde
1
Parent(s):
0ac505b
Add persistent storage support for Hugging Face Spaces - Enhanced app.py with automatic persistent storage detection - Added comprehensive persistent storage utilities - Added documentation and examples - Automatic HF_HOME and cache configuration for /data directory
Browse files- PERSISTENT_STORAGE_README.md +384 -0
- README.md +1 -0
- app.py +32 -3
- lmmvibes/utils/persistent_storage.py +179 -5
- lmmvibes/utils/persistent_storage_example.py +252 -0
PERSISTENT_STORAGE_README.md
ADDED
|
@@ -0,0 +1,384 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Persistent Storage Setup for Hugging Face Spaces
|
| 2 |
+
|
| 3 |
+
This guide explains how to set up and use persistent storage in Hugging Face Spaces for your LMM-Vibes application.
|
| 4 |
+
|
| 5 |
+
## Overview
|
| 6 |
+
|
| 7 |
+
Hugging Face Spaces provides persistent storage at the `/data` directory that persists across app restarts and deployments. This storage is perfect for:
|
| 8 |
+
|
| 9 |
+
- Caching models and datasets
|
| 10 |
+
- Storing user uploads and results
|
| 11 |
+
- Maintaining application state
|
| 12 |
+
- Saving experiment results
|
| 13 |
+
|
| 14 |
+
## Quick Start
|
| 15 |
+
|
| 16 |
+
### 1. Automatic Setup (Already Implemented)
|
| 17 |
+
|
| 18 |
+
Your application automatically detects and configures persistent storage when running in Hugging Face Spaces:
|
| 19 |
+
|
| 20 |
+
```python
|
| 21 |
+
# This is already handled in app.py
|
| 22 |
+
if is_persistent_storage_available():
|
| 23 |
+
# Configure HF cache to persistent storage
|
| 24 |
+
hf_home = get_hf_home_dir()
|
| 25 |
+
os.environ.setdefault("HF_HOME", str(hf_home))
|
| 26 |
+
|
| 27 |
+
# Set cache directories
|
| 28 |
+
cache_dir = get_cache_dir()
|
| 29 |
+
os.environ.setdefault("TRANSFORMERS_CACHE", str(cache_dir / "transformers"))
|
| 30 |
+
os.environ.setdefault("HF_DATASETS_CACHE", str(cache_dir / "datasets"))
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
### 2. Storage Structure
|
| 34 |
+
|
| 35 |
+
When persistent storage is available, your data is organized as follows:
|
| 36 |
+
|
| 37 |
+
```
|
| 38 |
+
/data/
|
| 39 |
+
├── app_data/ # Main application data
|
| 40 |
+
│ ├── experiments/ # Pipeline results and experiments
|
| 41 |
+
│ ├── dataframes/ # Saved pandas DataFrames
|
| 42 |
+
│ ├── logs/ # Application logs
|
| 43 |
+
│ └── uploads/ # User uploaded files
|
| 44 |
+
├── .cache/ # Application cache
|
| 45 |
+
│ ├── transformers/ # Hugging Face Transformers cache
|
| 46 |
+
│ └── datasets/ # Hugging Face Datasets cache
|
| 47 |
+
└── .huggingface/ # Hugging Face model cache
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
## Usage Examples
|
| 51 |
+
|
| 52 |
+
### Saving Data
|
| 53 |
+
|
| 54 |
+
```python
|
| 55 |
+
from lmmvibes.utils.persistent_storage import (
|
| 56 |
+
save_data_to_persistent,
|
| 57 |
+
save_uploaded_file
|
| 58 |
+
)
|
| 59 |
+
|
| 60 |
+
# Save binary data
|
| 61 |
+
data_bytes = b"your binary data"
|
| 62 |
+
saved_path = save_data_to_persistent(
|
| 63 |
+
data=data_bytes,
|
| 64 |
+
filename="my_data.bin",
|
| 65 |
+
subdirectory="experiments"
|
| 66 |
+
)
|
| 67 |
+
|
| 68 |
+
# Save uploaded file from Gradio
|
| 69 |
+
def handle_upload(uploaded_file):
|
| 70 |
+
if uploaded_file:
|
| 71 |
+
saved_path = save_uploaded_file(uploaded_file, "user_upload.zip")
|
| 72 |
+
return f"Saved to: {saved_path}"
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
### Loading Data
|
| 76 |
+
|
| 77 |
+
```python
|
| 78 |
+
from lmmvibes.utils.persistent_storage import load_data_from_persistent
|
| 79 |
+
|
| 80 |
+
# Load binary data
|
| 81 |
+
data_bytes = load_data_from_persistent("my_data.bin", "experiments")
|
| 82 |
+
if data_bytes:
|
| 83 |
+
# Process the data
|
| 84 |
+
data = data_bytes.decode('utf-8')
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
### Listing Files
|
| 88 |
+
|
| 89 |
+
```python
|
| 90 |
+
from lmmvibes.utils.persistent_storage import list_persistent_files
|
| 91 |
+
|
| 92 |
+
# List all files
|
| 93 |
+
all_files = list_persistent_files()
|
| 94 |
+
|
| 95 |
+
# List specific types of files
|
| 96 |
+
json_files = list_persistent_files(subdirectory="experiments", pattern="*.json")
|
| 97 |
+
parquet_files = list_persistent_files(subdirectory="dataframes", pattern="*.parquet")
|
| 98 |
+
```
|
| 99 |
+
|
| 100 |
+
### Checking Storage Status
|
| 101 |
+
|
| 102 |
+
```python
|
| 103 |
+
from lmmvibes.utils.persistent_storage import get_storage_info
|
| 104 |
+
|
| 105 |
+
info = get_storage_info()
|
| 106 |
+
print(f"Persistent storage available: {info['persistent_available']}")
|
| 107 |
+
print(f"Data directory: {info['data_dir']}")
|
| 108 |
+
print(f"Free space: {info['storage_paths']['free_gb']:.1f}GB")
|
| 109 |
+
```
|
| 110 |
+
|
| 111 |
+
## Integration with Your Application
|
| 112 |
+
|
| 113 |
+
### 1. Data Loading
|
| 114 |
+
|
| 115 |
+
Your application already uses persistent storage for loading pipeline results:
|
| 116 |
+
|
| 117 |
+
```python
|
| 118 |
+
# In data_loader.py - automatically uses persistent storage when available
|
| 119 |
+
def load_pipeline_results(results_dir: str):
|
| 120 |
+
# The function automatically checks for data in persistent storage
|
| 121 |
+
# Falls back to local storage if persistent storage is not available
|
| 122 |
+
pass
|
| 123 |
+
```
|
| 124 |
+
|
| 125 |
+
### 2. Caching
|
| 126 |
+
|
| 127 |
+
The application automatically caches data in persistent storage:
|
| 128 |
+
|
| 129 |
+
```python
|
| 130 |
+
# In data_loader.py - DataCache uses persistent storage when available
|
| 131 |
+
class DataCache:
|
| 132 |
+
@classmethod
|
| 133 |
+
def get(cls, key: str):
|
| 134 |
+
# Check persistent storage first, then memory cache
|
| 135 |
+
return cls._cache.get(key)
|
| 136 |
+
```
|
| 137 |
+
|
| 138 |
+
### 3. User Uploads
|
| 139 |
+
|
| 140 |
+
For handling user uploads in Gradio:
|
| 141 |
+
|
| 142 |
+
```python
|
| 143 |
+
import gradio as gr
|
| 144 |
+
from lmmvibes.utils.persistent_storage import save_uploaded_file
|
| 145 |
+
|
| 146 |
+
def handle_file_upload(file):
|
| 147 |
+
if file:
|
| 148 |
+
saved_path = save_uploaded_file(file, "user_upload.zip")
|
| 149 |
+
if saved_path:
|
| 150 |
+
return f"✅ File saved to persistent storage: {saved_path.name}"
|
| 151 |
+
else:
|
| 152 |
+
return "❌ Failed to save - persistent storage not available"
|
| 153 |
+
return "⚠️ No file uploaded"
|
| 154 |
+
|
| 155 |
+
# In your Gradio interface
|
| 156 |
+
with gr.Blocks() as demo:
|
| 157 |
+
file_input = gr.File(label="Upload data")
|
| 158 |
+
upload_btn = gr.Button("Save to persistent storage")
|
| 159 |
+
result = gr.Textbox(label="Status")
|
| 160 |
+
|
| 161 |
+
upload_btn.click(handle_file_upload, inputs=[file_input], outputs=[result])
|
| 162 |
+
```
|
| 163 |
+
|
| 164 |
+
## Best Practices
|
| 165 |
+
|
| 166 |
+
### 1. Check Availability
|
| 167 |
+
|
| 168 |
+
Always check if persistent storage is available before trying to use it:
|
| 169 |
+
|
| 170 |
+
```python
|
| 171 |
+
from lmmvibes.utils.persistent_storage import is_persistent_storage_available
|
| 172 |
+
|
| 173 |
+
if is_persistent_storage_available():
|
| 174 |
+
# Use persistent storage
|
| 175 |
+
save_data_to_persistent(data, "important_data.json")
|
| 176 |
+
else:
|
| 177 |
+
# Fall back to local storage or in-memory
|
| 178 |
+
print("Persistent storage not available")
|
| 179 |
+
```
|
| 180 |
+
|
| 181 |
+
### 2. Organize Data
|
| 182 |
+
|
| 183 |
+
Use subdirectories to organize your data:
|
| 184 |
+
|
| 185 |
+
```python
|
| 186 |
+
# Save experiments in their own directory
|
| 187 |
+
save_data_to_persistent(
|
| 188 |
+
data=experiment_data,
|
| 189 |
+
filename=f"{experiment_name}_results.json",
|
| 190 |
+
subdirectory="experiments"
|
| 191 |
+
)
|
| 192 |
+
|
| 193 |
+
# Save dataframes separately
|
| 194 |
+
save_data_to_persistent(
|
| 195 |
+
data=dataframe_bytes,
|
| 196 |
+
filename=f"{dataset_name}_data.parquet",
|
| 197 |
+
subdirectory="dataframes"
|
| 198 |
+
)
|
| 199 |
+
```
|
| 200 |
+
|
| 201 |
+
### 3. Handle Errors Gracefully
|
| 202 |
+
|
| 203 |
+
```python
|
| 204 |
+
def safe_save_data(data, filename):
|
| 205 |
+
try:
|
| 206 |
+
saved_path = save_data_to_persistent(data, filename)
|
| 207 |
+
if saved_path:
|
| 208 |
+
return f"✅ Saved to {saved_path}"
|
| 209 |
+
else:
|
| 210 |
+
return "❌ Failed to save - storage not available"
|
| 211 |
+
except Exception as e:
|
| 212 |
+
return f"❌ Error saving data: {e}"
|
| 213 |
+
```
|
| 214 |
+
|
| 215 |
+
### 4. Clean Up Old Data
|
| 216 |
+
|
| 217 |
+
Periodically clean up old files to manage storage space:
|
| 218 |
+
|
| 219 |
+
```python
|
| 220 |
+
from lmmvibes.utils.persistent_storage import list_persistent_files, delete_persistent_file
|
| 221 |
+
|
| 222 |
+
def cleanup_old_files(days_old=30):
|
| 223 |
+
"""Delete files older than specified days."""
|
| 224 |
+
import time
|
| 225 |
+
cutoff_time = time.time() - (days_old * 24 * 60 * 60)
|
| 226 |
+
|
| 227 |
+
for file in list_persistent_files():
|
| 228 |
+
if file.stat().st_mtime < cutoff_time:
|
| 229 |
+
delete_persistent_file(file.name)
|
| 230 |
+
```
|
| 231 |
+
|
| 232 |
+
## Troubleshooting
|
| 233 |
+
|
| 234 |
+
### 1. Storage Not Available
|
| 235 |
+
|
| 236 |
+
If persistent storage is not working:
|
| 237 |
+
|
| 238 |
+
```python
|
| 239 |
+
from lmmvibes.utils.persistent_storage import get_storage_info
|
| 240 |
+
|
| 241 |
+
info = get_storage_info()
|
| 242 |
+
print(f"Storage available: {info['persistent_available']}")
|
| 243 |
+
print(f"Data directory: {info['data_dir']}")
|
| 244 |
+
```
|
| 245 |
+
|
| 246 |
+
### 2. Permission Issues
|
| 247 |
+
|
| 248 |
+
If you encounter permission issues:
|
| 249 |
+
|
| 250 |
+
```python
|
| 251 |
+
# The utilities automatically create directories with proper permissions
|
| 252 |
+
# If issues persist, check if /data exists and is writable
|
| 253 |
+
import os
|
| 254 |
+
if os.path.isdir("/data") and os.access("/data", os.W_OK):
|
| 255 |
+
print("✅ Persistent storage is accessible and writable")
|
| 256 |
+
else:
|
| 257 |
+
print("❌ Persistent storage not accessible")
|
| 258 |
+
```
|
| 259 |
+
|
| 260 |
+
### 3. Storage Full
|
| 261 |
+
|
| 262 |
+
Monitor storage usage:
|
| 263 |
+
|
| 264 |
+
```python
|
| 265 |
+
info = get_storage_info()
|
| 266 |
+
if info['storage_paths']:
|
| 267 |
+
usage_pct = (info['storage_paths']['used_gb'] / info['storage_paths']['total_gb']) * 100
|
| 268 |
+
if usage_pct > 90:
|
| 269 |
+
print(f"⚠️ Storage nearly full: {usage_pct:.1f}% used")
|
| 270 |
+
# Implement cleanup logic
|
| 271 |
+
```
|
| 272 |
+
|
| 273 |
+
## Migration from Local Storage
|
| 274 |
+
|
| 275 |
+
If you're migrating from local storage to persistent storage:
|
| 276 |
+
|
| 277 |
+
1. **Backup existing data**: Copy your local `data/` directory to persistent storage
|
| 278 |
+
2. **Update paths**: Use the persistent storage utilities instead of hardcoded paths
|
| 279 |
+
3. **Test thoroughly**: Ensure all functionality works with persistent storage
|
| 280 |
+
4. **Monitor usage**: Keep track of storage usage and implement cleanup
|
| 281 |
+
|
| 282 |
+
## Example: Complete Integration
|
| 283 |
+
|
| 284 |
+
Here's a complete example of integrating persistent storage into your application:
|
| 285 |
+
|
| 286 |
+
```python
|
| 287 |
+
import gradio as gr
|
| 288 |
+
import json
|
| 289 |
+
import pandas as pd
|
| 290 |
+
from lmmvibes.utils.persistent_storage import (
|
| 291 |
+
save_data_to_persistent,
|
| 292 |
+
load_data_from_persistent,
|
| 293 |
+
list_persistent_files,
|
| 294 |
+
get_storage_info,
|
| 295 |
+
is_persistent_storage_available
|
| 296 |
+
)
|
| 297 |
+
|
| 298 |
+
def save_experiment_results(results_data, experiment_name):
|
| 299 |
+
"""Save experiment results to persistent storage."""
|
| 300 |
+
if not is_persistent_storage_available():
|
| 301 |
+
return "❌ Persistent storage not available"
|
| 302 |
+
|
| 303 |
+
try:
|
| 304 |
+
results_json = json.dumps(results_data, indent=2)
|
| 305 |
+
results_bytes = results_json.encode('utf-8')
|
| 306 |
+
|
| 307 |
+
filename = f"{experiment_name}_results.json"
|
| 308 |
+
saved_path = save_data_to_persistent(
|
| 309 |
+
data=results_bytes,
|
| 310 |
+
filename=filename,
|
| 311 |
+
subdirectory="experiments"
|
| 312 |
+
)
|
| 313 |
+
|
| 314 |
+
if saved_path:
|
| 315 |
+
return f"✅ Saved experiment to: {saved_path.name}"
|
| 316 |
+
else:
|
| 317 |
+
return "❌ Failed to save experiment"
|
| 318 |
+
except Exception as e:
|
| 319 |
+
return f"❌ Error: {e}"
|
| 320 |
+
|
| 321 |
+
def load_experiment_results(experiment_name):
|
| 322 |
+
"""Load experiment results from persistent storage."""
|
| 323 |
+
filename = f"{experiment_name}_results.json"
|
| 324 |
+
results_bytes = load_data_from_persistent(
|
| 325 |
+
filename=filename,
|
| 326 |
+
subdirectory="experiments"
|
| 327 |
+
)
|
| 328 |
+
|
| 329 |
+
if results_bytes:
|
| 330 |
+
results_data = json.loads(results_bytes.decode('utf-8'))
|
| 331 |
+
return json.dumps(results_data, indent=2)
|
| 332 |
+
else:
|
| 333 |
+
return "No results found"
|
| 334 |
+
|
| 335 |
+
def get_available_experiments():
|
| 336 |
+
"""List all available experiments."""
|
| 337 |
+
experiment_files = list_persistent_files(subdirectory="experiments", pattern="*_results.json")
|
| 338 |
+
if experiment_files:
|
| 339 |
+
return "\n".join([f.name for f in experiment_files])
|
| 340 |
+
else:
|
| 341 |
+
return "No experiments found"
|
| 342 |
+
|
| 343 |
+
# Gradio interface
|
| 344 |
+
with gr.Blocks(title="Persistent Storage Demo") as demo:
|
| 345 |
+
gr.Markdown("# Persistent Storage Demo")
|
| 346 |
+
|
| 347 |
+
with gr.Tab("Save Experiment"):
|
| 348 |
+
experiment_name = gr.Textbox(label="Experiment Name")
|
| 349 |
+
results_json = gr.Textbox(label="Results (JSON)", lines=5)
|
| 350 |
+
save_btn = gr.Button("Save Experiment")
|
| 351 |
+
save_result = gr.Textbox(label="Save Result")
|
| 352 |
+
|
| 353 |
+
save_btn.click(
|
| 354 |
+
save_experiment_results,
|
| 355 |
+
inputs=[results_json, experiment_name],
|
| 356 |
+
outputs=[save_result]
|
| 357 |
+
)
|
| 358 |
+
|
| 359 |
+
with gr.Tab("Load Experiment"):
|
| 360 |
+
load_experiment_name = gr.Textbox(label="Experiment Name")
|
| 361 |
+
load_btn = gr.Button("Load Experiment")
|
| 362 |
+
load_result = gr.Textbox(label="Loaded Results", lines=10)
|
| 363 |
+
|
| 364 |
+
load_btn.click(
|
| 365 |
+
load_experiment_results,
|
| 366 |
+
inputs=[load_experiment_name],
|
| 367 |
+
outputs=[load_result]
|
| 368 |
+
)
|
| 369 |
+
|
| 370 |
+
with gr.Tab("Storage Info"):
|
| 371 |
+
info_btn = gr.Button("Get Storage Info")
|
| 372 |
+
storage_info = gr.Textbox(label="Storage Information", lines=10)
|
| 373 |
+
|
| 374 |
+
def get_info():
|
| 375 |
+
info = get_storage_info()
|
| 376 |
+
return json.dumps(info, indent=2)
|
| 377 |
+
|
| 378 |
+
info_btn.click(get_info, outputs=[storage_info])
|
| 379 |
+
|
| 380 |
+
if __name__ == "__main__":
|
| 381 |
+
demo.launch()
|
| 382 |
+
```
|
| 383 |
+
|
| 384 |
+
This comprehensive setup ensures your application can take full advantage of Hugging Face Spaces' persistent storage capabilities while maintaining backward compatibility with local development.
|
README.md
CHANGED
|
@@ -2,6 +2,7 @@
|
|
| 2 |
title: Whatever This Is
|
| 3 |
colorFrom: yellow
|
| 4 |
colorTo: gray
|
|
|
|
| 5 |
sdk: gradio
|
| 6 |
sdk_version: 5.41.1
|
| 7 |
app_file: app.py
|
|
|
|
| 2 |
title: Whatever This Is
|
| 3 |
colorFrom: yellow
|
| 4 |
colorTo: gray
|
| 5 |
+
emoji: 🇬🇮
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: 5.41.1
|
| 8 |
app_file: app.py
|
app.py
CHANGED
|
@@ -1,10 +1,39 @@
|
|
| 1 |
import os
|
|
|
|
| 2 |
|
| 3 |
from lmmvibes.vis_gradio.app import launch_app
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
# Launch the app for Hugging Face Spaces
|
| 6 |
if __name__ == "__main__":
|
| 7 |
-
#
|
| 8 |
-
if
|
| 9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
launch_app(share=False, server_name="0.0.0.0", server_port=7860)
|
|
|
|
| 1 |
import os
|
| 2 |
+
from pathlib import Path
|
| 3 |
|
| 4 |
from lmmvibes.vis_gradio.app import launch_app
|
| 5 |
+
from lmmvibes.utils.persistent_storage import (
|
| 6 |
+
get_hf_home_dir,
|
| 7 |
+
get_cache_dir,
|
| 8 |
+
is_persistent_storage_available,
|
| 9 |
+
get_storage_info
|
| 10 |
+
)
|
| 11 |
|
| 12 |
# Launch the app for Hugging Face Spaces
|
| 13 |
if __name__ == "__main__":
|
| 14 |
+
# Set up persistent storage for Hugging Face Spaces
|
| 15 |
+
if is_persistent_storage_available():
|
| 16 |
+
print("🚀 Persistent storage available - configuring for HF Spaces")
|
| 17 |
+
|
| 18 |
+
# Set Hugging Face cache to persistent storage
|
| 19 |
+
hf_home = get_hf_home_dir()
|
| 20 |
+
os.environ.setdefault("HF_HOME", str(hf_home))
|
| 21 |
+
|
| 22 |
+
# Set cache directory for other libraries
|
| 23 |
+
cache_dir = get_cache_dir()
|
| 24 |
+
os.environ.setdefault("TRANSFORMERS_CACHE", str(cache_dir / "transformers"))
|
| 25 |
+
os.environ.setdefault("HF_DATASETS_CACHE", str(cache_dir / "datasets"))
|
| 26 |
+
|
| 27 |
+
# Print storage info
|
| 28 |
+
storage_info = get_storage_info()
|
| 29 |
+
print(f"📁 Data directory: {storage_info['data_dir']}")
|
| 30 |
+
print(f"🗄️ Cache directory: {storage_info['cache_dir']}")
|
| 31 |
+
print(f"🤗 HF Home: {storage_info['hf_home']}")
|
| 32 |
+
|
| 33 |
+
if storage_info['storage_paths']:
|
| 34 |
+
print(f"💾 Storage: {storage_info['storage_paths']['free_gb']:.1f}GB free / {storage_info['storage_paths']['total_gb']:.1f}GB total")
|
| 35 |
+
else:
|
| 36 |
+
print("⚠️ Persistent storage not available - using local storage")
|
| 37 |
+
|
| 38 |
+
# Launch the Gradio app
|
| 39 |
launch_app(share=False, server_name="0.0.0.0", server_port=7860)
|
lmmvibes/utils/persistent_storage.py
CHANGED
|
@@ -1,14 +1,22 @@
|
|
| 1 |
"""
|
| 2 |
Utilities for persistent storage in Hugging Face Spaces.
|
|
|
|
|
|
|
|
|
|
| 3 |
"""
|
| 4 |
import os
|
|
|
|
| 5 |
from pathlib import Path
|
| 6 |
-
from typing import Optional
|
|
|
|
| 7 |
|
| 8 |
|
| 9 |
def get_persistent_data_dir() -> Optional[Path]:
|
| 10 |
"""Get the persistent data directory if available.
|
| 11 |
|
|
|
|
|
|
|
|
|
|
| 12 |
Returns:
|
| 13 |
Path to persistent storage directory if available, None otherwise.
|
| 14 |
"""
|
|
@@ -22,6 +30,9 @@ def get_persistent_data_dir() -> Optional[Path]:
|
|
| 22 |
def get_cache_dir() -> Path:
|
| 23 |
"""Get the appropriate cache directory (persistent if available, temp otherwise).
|
| 24 |
|
|
|
|
|
|
|
|
|
|
| 25 |
Returns:
|
| 26 |
Path to cache directory.
|
| 27 |
"""
|
|
@@ -31,10 +42,27 @@ def get_cache_dir() -> Path:
|
|
| 31 |
return cache_dir
|
| 32 |
else:
|
| 33 |
# Fallback to temp directory
|
| 34 |
-
import tempfile
|
| 35 |
return Path(tempfile.gettempdir()) / "app_cache"
|
| 36 |
|
| 37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
def save_uploaded_file(uploaded_file, filename: str) -> Optional[Path]:
|
| 39 |
"""Save an uploaded file to persistent storage.
|
| 40 |
|
|
@@ -51,12 +79,112 @@ def save_uploaded_file(uploaded_file, filename: str) -> Optional[Path]:
|
|
| 51 |
save_path.parent.mkdir(parents=True, exist_ok=True)
|
| 52 |
|
| 53 |
# Copy the uploaded file to persistent storage
|
| 54 |
-
|
| 55 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
return save_path
|
| 57 |
return None
|
| 58 |
|
| 59 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
def is_persistent_storage_available() -> bool:
|
| 61 |
"""Check if persistent storage is available.
|
| 62 |
|
|
@@ -77,4 +205,50 @@ def get_persistent_results_dir() -> Optional[Path]:
|
|
| 77 |
results_dir = persistent_dir / "results"
|
| 78 |
results_dir.mkdir(exist_ok=True)
|
| 79 |
return results_dir
|
| 80 |
-
return None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
"""
|
| 2 |
Utilities for persistent storage in Hugging Face Spaces.
|
| 3 |
+
|
| 4 |
+
This module provides utilities for managing persistent storage in Hugging Face Spaces,
|
| 5 |
+
including data directories, cache management, and file operations.
|
| 6 |
"""
|
| 7 |
import os
|
| 8 |
+
import shutil
|
| 9 |
from pathlib import Path
|
| 10 |
+
from typing import Optional, Union
|
| 11 |
+
import tempfile
|
| 12 |
|
| 13 |
|
| 14 |
def get_persistent_data_dir() -> Optional[Path]:
|
| 15 |
"""Get the persistent data directory if available.
|
| 16 |
|
| 17 |
+
In Hugging Face Spaces, this will be `/data/app_data`.
|
| 18 |
+
Returns None if persistent storage is not available.
|
| 19 |
+
|
| 20 |
Returns:
|
| 21 |
Path to persistent storage directory if available, None otherwise.
|
| 22 |
"""
|
|
|
|
| 30 |
def get_cache_dir() -> Path:
|
| 31 |
"""Get the appropriate cache directory (persistent if available, temp otherwise).
|
| 32 |
|
| 33 |
+
In Hugging Face Spaces, this will be `/data/.cache`.
|
| 34 |
+
Falls back to temp directory in local development.
|
| 35 |
+
|
| 36 |
Returns:
|
| 37 |
Path to cache directory.
|
| 38 |
"""
|
|
|
|
| 42 |
return cache_dir
|
| 43 |
else:
|
| 44 |
# Fallback to temp directory
|
|
|
|
| 45 |
return Path(tempfile.gettempdir()) / "app_cache"
|
| 46 |
|
| 47 |
|
| 48 |
+
def get_hf_home_dir() -> Path:
|
| 49 |
+
"""Get the Hugging Face home directory for model caching.
|
| 50 |
+
|
| 51 |
+
In Hugging Face Spaces, this will be `/data/.huggingface`.
|
| 52 |
+
Falls back to default ~/.cache/huggingface in local development.
|
| 53 |
+
|
| 54 |
+
Returns:
|
| 55 |
+
Path to HF home directory.
|
| 56 |
+
"""
|
| 57 |
+
if os.path.isdir("/data"):
|
| 58 |
+
hf_home = Path("/data/.huggingface")
|
| 59 |
+
hf_home.mkdir(exist_ok=True)
|
| 60 |
+
return hf_home
|
| 61 |
+
else:
|
| 62 |
+
# Fallback to default location
|
| 63 |
+
return Path.home() / ".cache" / "huggingface"
|
| 64 |
+
|
| 65 |
+
|
| 66 |
def save_uploaded_file(uploaded_file, filename: str) -> Optional[Path]:
|
| 67 |
"""Save an uploaded file to persistent storage.
|
| 68 |
|
|
|
|
| 79 |
save_path.parent.mkdir(parents=True, exist_ok=True)
|
| 80 |
|
| 81 |
# Copy the uploaded file to persistent storage
|
| 82 |
+
if hasattr(uploaded_file, 'name'):
|
| 83 |
+
# Gradio file object
|
| 84 |
+
shutil.copy2(uploaded_file.name, save_path)
|
| 85 |
+
else:
|
| 86 |
+
# Direct file path
|
| 87 |
+
shutil.copy2(uploaded_file, save_path)
|
| 88 |
+
return save_path
|
| 89 |
+
return None
|
| 90 |
+
|
| 91 |
+
|
| 92 |
+
def save_data_to_persistent(data: bytes, filename: str, subdirectory: str = "") -> Optional[Path]:
|
| 93 |
+
"""Save binary data to persistent storage.
|
| 94 |
+
|
| 95 |
+
Args:
|
| 96 |
+
data: Binary data to save
|
| 97 |
+
filename: Name to save the file as
|
| 98 |
+
subdirectory: Optional subdirectory within persistent storage
|
| 99 |
+
|
| 100 |
+
Returns:
|
| 101 |
+
Path to saved file if successful, None otherwise.
|
| 102 |
+
"""
|
| 103 |
+
persistent_dir = get_persistent_data_dir()
|
| 104 |
+
if persistent_dir:
|
| 105 |
+
if subdirectory:
|
| 106 |
+
save_dir = persistent_dir / subdirectory
|
| 107 |
+
save_dir.mkdir(exist_ok=True)
|
| 108 |
+
else:
|
| 109 |
+
save_dir = persistent_dir
|
| 110 |
+
|
| 111 |
+
save_path = save_dir / filename
|
| 112 |
+
save_path.parent.mkdir(parents=True, exist_ok=True)
|
| 113 |
+
|
| 114 |
+
with open(save_path, 'wb') as f:
|
| 115 |
+
f.write(data)
|
| 116 |
return save_path
|
| 117 |
return None
|
| 118 |
|
| 119 |
|
| 120 |
+
def load_data_from_persistent(filename: str, subdirectory: str = "") -> Optional[bytes]:
|
| 121 |
+
"""Load binary data from persistent storage.
|
| 122 |
+
|
| 123 |
+
Args:
|
| 124 |
+
filename: Name of the file to load
|
| 125 |
+
subdirectory: Optional subdirectory within persistent storage
|
| 126 |
+
|
| 127 |
+
Returns:
|
| 128 |
+
Binary data if successful, None otherwise.
|
| 129 |
+
"""
|
| 130 |
+
persistent_dir = get_persistent_data_dir()
|
| 131 |
+
if persistent_dir:
|
| 132 |
+
if subdirectory:
|
| 133 |
+
load_path = persistent_dir / subdirectory / filename
|
| 134 |
+
else:
|
| 135 |
+
load_path = persistent_dir / filename
|
| 136 |
+
|
| 137 |
+
if load_path.exists():
|
| 138 |
+
with open(load_path, 'rb') as f:
|
| 139 |
+
return f.read()
|
| 140 |
+
return None
|
| 141 |
+
|
| 142 |
+
|
| 143 |
+
def list_persistent_files(subdirectory: str = "", pattern: str = "*") -> list[Path]:
|
| 144 |
+
"""List files in persistent storage.
|
| 145 |
+
|
| 146 |
+
Args:
|
| 147 |
+
subdirectory: Optional subdirectory within persistent storage
|
| 148 |
+
pattern: Glob pattern to match files (e.g., "*.json", "data_*")
|
| 149 |
+
|
| 150 |
+
Returns:
|
| 151 |
+
List of Path objects for matching files.
|
| 152 |
+
"""
|
| 153 |
+
persistent_dir = get_persistent_data_dir()
|
| 154 |
+
if persistent_dir:
|
| 155 |
+
if subdirectory:
|
| 156 |
+
search_dir = persistent_dir / subdirectory
|
| 157 |
+
else:
|
| 158 |
+
search_dir = persistent_dir
|
| 159 |
+
|
| 160 |
+
if search_dir.exists():
|
| 161 |
+
return list(search_dir.glob(pattern))
|
| 162 |
+
return []
|
| 163 |
+
|
| 164 |
+
|
| 165 |
+
def delete_persistent_file(filename: str, subdirectory: str = "") -> bool:
|
| 166 |
+
"""Delete a file from persistent storage.
|
| 167 |
+
|
| 168 |
+
Args:
|
| 169 |
+
filename: Name of the file to delete
|
| 170 |
+
subdirectory: Optional subdirectory within persistent storage
|
| 171 |
+
|
| 172 |
+
Returns:
|
| 173 |
+
True if successful, False otherwise.
|
| 174 |
+
"""
|
| 175 |
+
persistent_dir = get_persistent_data_dir()
|
| 176 |
+
if persistent_dir:
|
| 177 |
+
if subdirectory:
|
| 178 |
+
file_path = persistent_dir / subdirectory / filename
|
| 179 |
+
else:
|
| 180 |
+
file_path = persistent_dir / filename
|
| 181 |
+
|
| 182 |
+
if file_path.exists():
|
| 183 |
+
file_path.unlink()
|
| 184 |
+
return True
|
| 185 |
+
return False
|
| 186 |
+
|
| 187 |
+
|
| 188 |
def is_persistent_storage_available() -> bool:
|
| 189 |
"""Check if persistent storage is available.
|
| 190 |
|
|
|
|
| 205 |
results_dir = persistent_dir / "results"
|
| 206 |
results_dir.mkdir(exist_ok=True)
|
| 207 |
return results_dir
|
| 208 |
+
return None
|
| 209 |
+
|
| 210 |
+
|
| 211 |
+
def get_persistent_logs_dir() -> Optional[Path]:
|
| 212 |
+
"""Get the persistent logs directory for storing application logs.
|
| 213 |
+
|
| 214 |
+
Returns:
|
| 215 |
+
Path to persistent logs directory if available, None otherwise.
|
| 216 |
+
"""
|
| 217 |
+
persistent_dir = get_persistent_data_dir()
|
| 218 |
+
if persistent_dir:
|
| 219 |
+
logs_dir = persistent_dir / "logs"
|
| 220 |
+
logs_dir.mkdir(exist_ok=True)
|
| 221 |
+
return logs_dir
|
| 222 |
+
return None
|
| 223 |
+
|
| 224 |
+
|
| 225 |
+
def get_storage_info() -> dict:
|
| 226 |
+
"""Get information about available storage.
|
| 227 |
+
|
| 228 |
+
Returns:
|
| 229 |
+
Dictionary with storage information.
|
| 230 |
+
"""
|
| 231 |
+
info = {
|
| 232 |
+
"persistent_available": is_persistent_storage_available(),
|
| 233 |
+
"data_dir": None,
|
| 234 |
+
"cache_dir": str(get_cache_dir()),
|
| 235 |
+
"hf_home": str(get_hf_home_dir()),
|
| 236 |
+
"storage_paths": {}
|
| 237 |
+
}
|
| 238 |
+
|
| 239 |
+
if info["persistent_available"]:
|
| 240 |
+
data_dir = get_persistent_data_dir()
|
| 241 |
+
info["data_dir"] = str(data_dir)
|
| 242 |
+
|
| 243 |
+
# Check available space
|
| 244 |
+
try:
|
| 245 |
+
total, used, free = shutil.disk_usage(data_dir)
|
| 246 |
+
info["storage_paths"] = {
|
| 247 |
+
"total_gb": round(total / (1024**3), 2),
|
| 248 |
+
"used_gb": round(used / (1024**3), 2),
|
| 249 |
+
"free_gb": round(free / (1024**3), 2)
|
| 250 |
+
}
|
| 251 |
+
except OSError:
|
| 252 |
+
pass
|
| 253 |
+
|
| 254 |
+
return info
|
lmmvibes/utils/persistent_storage_example.py
ADDED
|
@@ -0,0 +1,252 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Example usage of persistent storage utilities for Hugging Face Spaces.
|
| 3 |
+
|
| 4 |
+
This file demonstrates how to use the persistent storage utilities
|
| 5 |
+
for saving and loading data in Hugging Face Spaces.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import json
|
| 9 |
+
import pandas as pd
|
| 10 |
+
from pathlib import Path
|
| 11 |
+
|
| 12 |
+
from .persistent_storage import (
|
| 13 |
+
get_persistent_data_dir,
|
| 14 |
+
get_cache_dir,
|
| 15 |
+
get_hf_home_dir,
|
| 16 |
+
save_data_to_persistent,
|
| 17 |
+
load_data_from_persistent,
|
| 18 |
+
save_uploaded_file,
|
| 19 |
+
list_persistent_files,
|
| 20 |
+
delete_persistent_file,
|
| 21 |
+
is_persistent_storage_available,
|
| 22 |
+
get_storage_info
|
| 23 |
+
)
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
def example_save_results(results_data: dict, experiment_name: str):
|
| 27 |
+
"""Example: Save pipeline results to persistent storage.
|
| 28 |
+
|
| 29 |
+
Args:
|
| 30 |
+
results_data: Dictionary containing pipeline results
|
| 31 |
+
experiment_name: Name of the experiment
|
| 32 |
+
"""
|
| 33 |
+
if not is_persistent_storage_available():
|
| 34 |
+
print("⚠️ Persistent storage not available - skipping save")
|
| 35 |
+
return None
|
| 36 |
+
|
| 37 |
+
# Save results as JSON
|
| 38 |
+
results_json = json.dumps(results_data, indent=2)
|
| 39 |
+
results_bytes = results_json.encode('utf-8')
|
| 40 |
+
|
| 41 |
+
filename = f"{experiment_name}_results.json"
|
| 42 |
+
saved_path = save_data_to_persistent(
|
| 43 |
+
data=results_bytes,
|
| 44 |
+
filename=filename,
|
| 45 |
+
subdirectory="experiments"
|
| 46 |
+
)
|
| 47 |
+
|
| 48 |
+
if saved_path:
|
| 49 |
+
print(f"✅ Saved results to: {saved_path}")
|
| 50 |
+
return saved_path
|
| 51 |
+
else:
|
| 52 |
+
print("❌ Failed to save results")
|
| 53 |
+
return None
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
def example_load_results(experiment_name: str):
|
| 57 |
+
"""Example: Load pipeline results from persistent storage.
|
| 58 |
+
|
| 59 |
+
Args:
|
| 60 |
+
experiment_name: Name of the experiment
|
| 61 |
+
|
| 62 |
+
Returns:
|
| 63 |
+
Dictionary containing the loaded results or None
|
| 64 |
+
"""
|
| 65 |
+
filename = f"{experiment_name}_results.json"
|
| 66 |
+
results_bytes = load_data_from_persistent(
|
| 67 |
+
filename=filename,
|
| 68 |
+
subdirectory="experiments"
|
| 69 |
+
)
|
| 70 |
+
|
| 71 |
+
if results_bytes:
|
| 72 |
+
results_data = json.loads(results_bytes.decode('utf-8'))
|
| 73 |
+
print(f"✅ Loaded results from: {filename}")
|
| 74 |
+
return results_data
|
| 75 |
+
else:
|
| 76 |
+
print(f"❌ No results found for: {filename}")
|
| 77 |
+
return None
|
| 78 |
+
|
| 79 |
+
|
| 80 |
+
def example_save_dataframe(df: pd.DataFrame, filename: str):
|
| 81 |
+
"""Example: Save a pandas DataFrame to persistent storage.
|
| 82 |
+
|
| 83 |
+
Args:
|
| 84 |
+
df: DataFrame to save
|
| 85 |
+
filename: Name of the file (with .parquet extension)
|
| 86 |
+
"""
|
| 87 |
+
if not is_persistent_storage_available():
|
| 88 |
+
print("⚠️ Persistent storage not available - skipping save")
|
| 89 |
+
return None
|
| 90 |
+
|
| 91 |
+
# Convert DataFrame to parquet bytes
|
| 92 |
+
try:
|
| 93 |
+
parquet_bytes = df.to_parquet()
|
| 94 |
+
saved_path = save_data_to_persistent(
|
| 95 |
+
data=parquet_bytes,
|
| 96 |
+
filename=filename,
|
| 97 |
+
subdirectory="dataframes"
|
| 98 |
+
)
|
| 99 |
+
|
| 100 |
+
if saved_path:
|
| 101 |
+
print(f"✅ Saved DataFrame to: {saved_path}")
|
| 102 |
+
return saved_path
|
| 103 |
+
else:
|
| 104 |
+
print("❌ Failed to save DataFrame")
|
| 105 |
+
return None
|
| 106 |
+
except Exception as e:
|
| 107 |
+
print(f"❌ Error saving DataFrame: {e}")
|
| 108 |
+
return None
|
| 109 |
+
|
| 110 |
+
|
| 111 |
+
def example_list_saved_files():
|
| 112 |
+
"""Example: List all files saved in persistent storage."""
|
| 113 |
+
if not is_persistent_storage_available():
|
| 114 |
+
print("⚠️ Persistent storage not available")
|
| 115 |
+
return []
|
| 116 |
+
|
| 117 |
+
print("📁 Files in persistent storage:")
|
| 118 |
+
|
| 119 |
+
# List all files
|
| 120 |
+
all_files = list_persistent_files()
|
| 121 |
+
if all_files:
|
| 122 |
+
for file in all_files:
|
| 123 |
+
print(f" - {file.name}")
|
| 124 |
+
else:
|
| 125 |
+
print(" No files found")
|
| 126 |
+
|
| 127 |
+
# List experiment files
|
| 128 |
+
experiment_files = list_persistent_files(subdirectory="experiments", pattern="*.json")
|
| 129 |
+
if experiment_files:
|
| 130 |
+
print("\n🔬 Experiment files:")
|
| 131 |
+
for file in experiment_files:
|
| 132 |
+
print(f" - {file.name}")
|
| 133 |
+
|
| 134 |
+
# List dataframe files
|
| 135 |
+
dataframe_files = list_persistent_files(subdirectory="dataframes", pattern="*.parquet")
|
| 136 |
+
if dataframe_files:
|
| 137 |
+
print("\n📊 DataFrame files:")
|
| 138 |
+
for file in dataframe_files:
|
| 139 |
+
print(f" - {file.name}")
|
| 140 |
+
|
| 141 |
+
return all_files
|
| 142 |
+
|
| 143 |
+
|
| 144 |
+
def example_storage_cleanup(days_old: int = 30):
|
| 145 |
+
"""Example: Clean up old files from persistent storage.
|
| 146 |
+
|
| 147 |
+
Args:
|
| 148 |
+
days_old: Delete files older than this many days
|
| 149 |
+
"""
|
| 150 |
+
if not is_persistent_storage_available():
|
| 151 |
+
print("⚠️ Persistent storage not available")
|
| 152 |
+
return
|
| 153 |
+
|
| 154 |
+
import time
|
| 155 |
+
from datetime import datetime, timedelta
|
| 156 |
+
|
| 157 |
+
cutoff_time = time.time() - (days_old * 24 * 60 * 60)
|
| 158 |
+
|
| 159 |
+
print(f"🧹 Cleaning up files older than {days_old} days...")
|
| 160 |
+
|
| 161 |
+
# List all files and check their modification time
|
| 162 |
+
all_files = list_persistent_files()
|
| 163 |
+
deleted_count = 0
|
| 164 |
+
|
| 165 |
+
for file in all_files:
|
| 166 |
+
if file.stat().st_mtime < cutoff_time:
|
| 167 |
+
if delete_persistent_file(file.name):
|
| 168 |
+
print(f"🗑️ Deleted: {file.name}")
|
| 169 |
+
deleted_count += 1
|
| 170 |
+
|
| 171 |
+
print(f"✅ Cleanup complete - deleted {deleted_count} files")
|
| 172 |
+
|
| 173 |
+
|
| 174 |
+
def example_storage_info():
|
| 175 |
+
"""Example: Display information about persistent storage."""
|
| 176 |
+
info = get_storage_info()
|
| 177 |
+
|
| 178 |
+
print("📊 Persistent Storage Information:")
|
| 179 |
+
print(f" Available: {info['persistent_available']}")
|
| 180 |
+
|
| 181 |
+
if info['persistent_available']:
|
| 182 |
+
print(f" Data directory: {info['data_dir']}")
|
| 183 |
+
print(f" Cache directory: {info['cache_dir']}")
|
| 184 |
+
print(f" HF Home: {info['hf_home']}")
|
| 185 |
+
|
| 186 |
+
if info['storage_paths']:
|
| 187 |
+
print(f" Total storage: {info['storage_paths']['total_gb']:.1f}GB")
|
| 188 |
+
print(f" Used storage: {info['storage_paths']['used_gb']:.1f}GB")
|
| 189 |
+
print(f" Free storage: {info['storage_paths']['free_gb']:.1f}GB")
|
| 190 |
+
|
| 191 |
+
# Calculate usage percentage
|
| 192 |
+
usage_pct = (info['storage_paths']['used_gb'] / info['storage_paths']['total_gb']) * 100
|
| 193 |
+
print(f" Usage: {usage_pct:.1f}%")
|
| 194 |
+
|
| 195 |
+
|
| 196 |
+
# Example usage in a Gradio app
|
| 197 |
+
def example_gradio_integration():
|
| 198 |
+
"""Example: How to integrate persistent storage with Gradio."""
|
| 199 |
+
|
| 200 |
+
def save_uploaded_data(uploaded_file):
|
| 201 |
+
"""Save a file uploaded through Gradio."""
|
| 202 |
+
if uploaded_file:
|
| 203 |
+
saved_path = save_uploaded_file(uploaded_file, "user_upload.txt")
|
| 204 |
+
if saved_path:
|
| 205 |
+
return f"✅ File saved to persistent storage: {saved_path.name}"
|
| 206 |
+
else:
|
| 207 |
+
return "❌ Failed to save file - persistent storage not available"
|
| 208 |
+
return "⚠️ No file uploaded"
|
| 209 |
+
|
| 210 |
+
def load_user_data():
|
| 211 |
+
"""Load previously uploaded data."""
|
| 212 |
+
data_bytes = load_data_from_persistent("user_upload.txt")
|
| 213 |
+
if data_bytes:
|
| 214 |
+
return data_bytes.decode('utf-8')
|
| 215 |
+
return "No data found"
|
| 216 |
+
|
| 217 |
+
# This would be used in a Gradio interface like:
|
| 218 |
+
# import gradio as gr
|
| 219 |
+
#
|
| 220 |
+
# with gr.Blocks() as demo:
|
| 221 |
+
# file_input = gr.File(label="Upload file")
|
| 222 |
+
# upload_btn = gr.Button("Save to persistent storage")
|
| 223 |
+
# download_btn = gr.Button("Load from persistent storage")
|
| 224 |
+
#
|
| 225 |
+
# upload_btn.click(save_uploaded_data, inputs=[file_input])
|
| 226 |
+
# download_btn.click(load_user_data)
|
| 227 |
+
|
| 228 |
+
|
| 229 |
+
if __name__ == "__main__":
|
| 230 |
+
# Run examples
|
| 231 |
+
print("🔍 Persistent Storage Examples")
|
| 232 |
+
print("=" * 40)
|
| 233 |
+
|
| 234 |
+
example_storage_info()
|
| 235 |
+
print()
|
| 236 |
+
|
| 237 |
+
example_list_saved_files()
|
| 238 |
+
print()
|
| 239 |
+
|
| 240 |
+
# Example: Save some test data
|
| 241 |
+
test_data = {"experiment": "test", "results": [1, 2, 3], "timestamp": "2024-01-01"}
|
| 242 |
+
example_save_results(test_data, "test_experiment")
|
| 243 |
+
print()
|
| 244 |
+
|
| 245 |
+
# Example: Load the test data
|
| 246 |
+
loaded_data = example_load_results("test_experiment")
|
| 247 |
+
if loaded_data:
|
| 248 |
+
print(f"📊 Loaded data: {loaded_data}")
|
| 249 |
+
print()
|
| 250 |
+
|
| 251 |
+
# Example: List files again
|
| 252 |
+
example_list_saved_files()
|