Spaces:
Running
Running
[node] estimation
Browse files
README.md
CHANGED
@@ -89,6 +89,8 @@ python app.py
|
|
89 |
| Qwen2-VL-7B | 1024/256 | 1 | Inference | FP16 | 1 |
|
90 |
| VILA-1.5-13B | 2048/512 | 2 | Inference | BF16 | 1 |
|
91 |
| Qwen2-Audio-7B | 1024/256 | 1 | Inference | FP16 | 1 |
|
|
|
|
|
92 |
|
93 |
## CUDA Recommendations
|
94 |
|
@@ -133,6 +135,12 @@ The application provides tailored CUDA version recommendations:
|
|
133 |
- **Qwen-Audio**: Base, Chat variants
|
134 |
- **Qwen2-Audio**: 7B
|
135 |
|
|
|
|
|
|
|
|
|
|
|
|
|
136 |
### Precision Impact
|
137 |
- **FP32**: Full precision (4 bytes per parameter)
|
138 |
- **FP16/BF16**: Half precision (2 bytes per parameter)
|
@@ -145,6 +153,17 @@ The application provides tailored CUDA version recommendations:
|
|
145 |
- **Memory Overhead**: Additional memory for vision/audio encoders and cross-modal attention
|
146 |
- **Token Estimation**: Consider multimodal inputs when calculating token counts
|
147 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
148 |
## Limitations
|
149 |
|
150 |
- Estimates are approximate and may vary based on:
|
|
|
89 |
| Qwen2-VL-7B | 1024/256 | 1 | Inference | FP16 | 1 |
|
90 |
| VILA-1.5-13B | 2048/512 | 2 | Inference | BF16 | 1 |
|
91 |
| Qwen2-Audio-7B | 1024/256 | 1 | Inference | FP16 | 1 |
|
92 |
+
| PhysicsNeMo-FNO-Large | 512/128 | 8 | Training | FP32 | 1 |
|
93 |
+
| PhysicsNeMo-GraphCast-Medium | 1024/256 | 4 | Training | FP16 | 1 |
|
94 |
|
95 |
## CUDA Recommendations
|
96 |
|
|
|
135 |
- **Qwen-Audio**: Base, Chat variants
|
136 |
- **Qwen2-Audio**: 7B
|
137 |
|
138 |
+
#### Physics-ML Models (NVIDIA PhysicsNeMo)
|
139 |
+
- **Fourier Neural Operators (FNO)**: Small (1M), Medium (10M), Large (50M)
|
140 |
+
- **Physics-Informed Neural Networks (PINN)**: Small (0.5M), Medium (5M), Large (20M)
|
141 |
+
- **GraphCast**: Small (50M), Medium (200M), Large (1B) - for weather/climate modeling
|
142 |
+
- **Spherical FNO (SFNO)**: Small (25M), Medium (100M), Large (500M) - for global simulations
|
143 |
+
|
144 |
### Precision Impact
|
145 |
- **FP32**: Full precision (4 bytes per parameter)
|
146 |
- **FP16/BF16**: Half precision (2 bytes per parameter)
|
|
|
153 |
- **Memory Overhead**: Additional memory for vision/audio encoders and cross-modal attention
|
154 |
- **Token Estimation**: Consider multimodal inputs when calculating token counts
|
155 |
|
156 |
+
### PhysicsNeMo Considerations
|
157 |
+
- **Grid-Based Data**: Physics models work with spatial/temporal grids rather than text tokens
|
158 |
+
- **Batch Training**: Physics-ML models typically require larger batch sizes for stable training
|
159 |
+
- **Memory Patterns**: Different from LLMs - less KV cache, more gradient memory for PDE constraints
|
160 |
+
- **Precision Requirements**: Many physics simulations require FP32 for numerical stability
|
161 |
+
- **Use Cases**:
|
162 |
+
- **FNO**: Solving PDEs on regular grids (fluid dynamics, heat transfer)
|
163 |
+
- **PINN**: Physics-informed training with PDE constraints
|
164 |
+
- **GraphCast**: Weather prediction and climate modeling
|
165 |
+
- **SFNO**: Global atmospheric and oceanic simulations
|
166 |
+
|
167 |
## Limitations
|
168 |
|
169 |
- Estimates are approximate and may vary based on:
|
app.py
CHANGED
@@ -272,10 +272,10 @@ def estimate_nodes_interface(
|
|
272 |
|
273 |
# Validate inputs
|
274 |
if input_tokens <= 0 or output_tokens <= 0:
|
275 |
-
return "Please enter valid token counts (> 0)", "", None, ""
|
276 |
|
277 |
if batch_size <= 0:
|
278 |
-
return "Please enter a valid batch size (> 0)", "", None, ""
|
279 |
|
280 |
# Calculate node requirements
|
281 |
nodes_needed, explanation, breakdown = estimate_h100_nodes(
|
@@ -288,7 +288,7 @@ def estimate_nodes_interface(
|
|
288 |
# Create performance chart
|
289 |
fig = create_performance_chart(breakdown)
|
290 |
|
291 |
-
return explanation, cuda_rec, fig, f"
|
292 |
|
293 |
# Create Gradio interface
|
294 |
def create_interface():
|
@@ -345,7 +345,7 @@ def create_interface():
|
|
345 |
with gr.Column(scale=2):
|
346 |
gr.Markdown("## Results")
|
347 |
|
348 |
-
node_count = gr.Markdown("
|
349 |
|
350 |
with gr.Tab("📊 Detailed Analysis"):
|
351 |
detailed_output = gr.Markdown()
|
|
|
272 |
|
273 |
# Validate inputs
|
274 |
if input_tokens <= 0 or output_tokens <= 0:
|
275 |
+
return "Please enter valid token counts (> 0)", "", None, "## ⚠️ <span style='color: #E74C3C;'>**Invalid Input: Token counts must be > 0**</span>"
|
276 |
|
277 |
if batch_size <= 0:
|
278 |
+
return "Please enter a valid batch size (> 0)", "", None, "## ⚠️ <span style='color: #E74C3C;'>**Invalid Input: Batch size must be > 0**</span>"
|
279 |
|
280 |
# Calculate node requirements
|
281 |
nodes_needed, explanation, breakdown = estimate_h100_nodes(
|
|
|
288 |
# Create performance chart
|
289 |
fig = create_performance_chart(breakdown)
|
290 |
|
291 |
+
return explanation, cuda_rec, fig, f"## 🖥️ <span style='color: #4A90E2;'>**Estimated H100 Nodes Required: {nodes_needed}**</span>"
|
292 |
|
293 |
# Create Gradio interface
|
294 |
def create_interface():
|
|
|
345 |
with gr.Column(scale=2):
|
346 |
gr.Markdown("## Results")
|
347 |
|
348 |
+
node_count = gr.Markdown("## 🖥️ <span style='color: #4A90E2;'>**Ready to estimate...**</span>")
|
349 |
|
350 |
with gr.Tab("📊 Detailed Analysis"):
|
351 |
detailed_output = gr.Markdown()
|