AbstractPhil
/

pentachora-multi-channel-frequency-encoded

@@ -1,61 +1,52 @@
----
-license: apache-2.0
-tags:
-- chemistry
-- biology
-- art
----
-Pentachora Adaptive Encoded (Multi-Channel)
-A geometry-regularized classifier with a 5-frequency encoder and pentachoron constellation heads.
-Authors: AbstractPhil, Quartermaster: Mirel - GPT 4o * GPT 5 * GPT 5 Thinking * GPT 5 Fast
-Contributions: Claude 4.1 Opus, Claude 4 Sonnet, Gemini
-License: Apache-2.0
-📌 TL;DR
-This repository hosts training runs of a frequency-aware encoder (PentaFreq) paired with a pentachoron constellation classifier (dispatchers + specialists). The model blends classic cross-entropy with two contrastive objectives (dual InfoNCE and ROSE-weighted InfoNCE) and a geometric regularizer that keeps the learned vertex geometry sane.
-It supports 1-channel and 3-channel 28×28 inputs (e.g., TorchVision MNIST variants and MedMNIST 2D sets), is seeded/deterministic, and ships full artifacts (weights, plots, history, TensorBoard) for review.
-🧠 Model overview
-Architecture
-PentaFreq Encoder (multi-channel)
-5 spectral branches (ultra-high, high, mid, low-mid, low) → per-branch encoders → cross-attention → MLP fusion → normalized latent z.
-Channel-aware: supports C ∈ {1,3}; input is flattened to C×28×28.
-Pentachoron Constellation Classifier
-Two stacks (dispatchers & specialists) each containing pentachora (5-vertex simplices) with learnable vertices.
-Coherence gate modulates vertex logits; group heads (one per vertex) score class subsets; pair aggregation + fusion MLP produce final logits.
-Geometry terms encourage valid simplex structure and separation between the two stacks.
-Objective
-CE – main cross-entropy on logits.
-Dual InfoNCE (stable) – encourages z to match the correct vertex across both stacks.
-ROSE-weighted InfoNCE (stable) – same idea, but reweights samples by an analytic ROSE similarity (triadic cosine + magnitude).
-Geometry Regularization – stable Cayley–Menger proxy (eigval-based), edge-variance, center separation, and a soft radius control; ramped in early epochs.
-All contrastive losses use log_softmax + gather to avoid inf−inf traps; all paths nan-sanitize defensively.
-Determinism
-Global seeding (Python/NumPy/Torch), deterministic DataLoader workers, generator-seeded samplers; cuDNN deterministic & TF32 off.
-Optional strict mode (torch.use_deterministic_algorithms(True)) and deterministic cuBLAS.
-🗂️ Repository layout per run
 Each training run uploads a complete bundle at:
 <repo>/<root>/<DatasetName>/<Timestamp_or_best>/
   weights/
     encoder[_<Dataset>].safetensors
@@ -66,53 +57,55 @@ Each training run uploads a complete bundle at:
   history.json / history.csv
   tensorboard/ (+ zip)
   plots/  # accuracy, loss components, lambda, confusion matrices
-We also optionally publish a best/ alias inside each dataset folder pointing to the current champion.
-🧩 Intended use & use cases
-Intended use: research-grade supervised classification and geometry-regularized representation learning on small images (28×28) across gray and color channels.
-Example use cases
-Benchmarking on MNIST family / MedMNIST 2D sets with defensible, reproducible training and complete artifacts.
-Geometry-aware representation learning: analyze how simplex vertices move, how the gate allocates probability mass, and how geometry regularization affects generalization.
-Class routing / specialization: per-vertex group heads provide an interpretable split of classes; confusion-driven vertex reweighting helps diagnose hard groups.
-Curriculum & loss ablations: toggle ROSE, dual InfoNCE, or geometry terms to study their marginal value under a controlled seed.
-OOD “pressure tests” (research): ROSE magnitude and routing entropy can be used as quick signals of uncertainty (not calibrated).
-Education & reproducibility: the runs are fully seeded, include TensorBoard logs and plots, and use safe numerical formulations.
-🚫 Out-of-scope / limitations
-Not a medical device – even if trained on MedMNIST subsets, this is not a diagnostic tool. Don’t use it for clinical decisions.
-Input size is 28×28; higher-resolution domains require retraining and likely architecture tweaks.
-Dataset bias / shift – performance depends on the underlying distribution. Evaluate before deployment.
-Calibration – logits are not guaranteed calibrated. For decision thresholds, use a validation set or post-hoc calibration.
-Robustness – robustness to adversarial perturbations is not a design goal here.
-📈 Example results (single-seed snapshots)
-Numbers below are indicative from our seeded runs with img_size=28, size-aware LR schedule and reg ramp; see manifest.json in each run for exact details.
-Dataset	C	Best Test Acc	Epoch	Notes
-MNIST/Fashion*	1	0.97–0.98	15–25	stable losses + reg ramp
-BloodMNIST	3	~0.95–0.97+	20–30	color preserved, 28×28
-EMNIST (bal)	1	0.88–0.92	25–45	many classes; pairs auto-scaled
-* depending on which of the pair (MNIST / FashionMNIST) is selected.
-Consult each dataset folder’s history.csv for the full learning curve and the current best accuracy.
-🔧 How to use (PyTorch)
 import torch
 from safetensors.torch import load_file as load_safetensors
@@ -143,104 +136,96 @@ with torch.no_grad():
     logits, diag_out = constellation(z)    # [B, C]
     pred = logits.argmax(dim=1)
 print(pred)
-To reproduce training, see config.json and history.csv; all recipes are encoded in the flagship notebook used for these runs.
-🔬 Training procedure (default)
-Optimizer: AdamW (β1=0.9, β2=0.999), size-aware LR (≈2e-2 by default)
-Schedule: 10% warmup → cosine to lr_min=1e-6
-Batch size: up to 2048 (fits on T4/A100 at 28×28)
-Loss: CE + Dual InfoNCE + ROSE InfoNCE + Geometry Reg (ramped) + Diag MSE
-Determinism: seeds for Python/NumPy/Torch (CPU/GPU), deterministic DataLoader workers and samplers, cuDNN deterministic, TF32 off
-Numerical safety: log-softmax contrastive, eigval CM proxy, nan_to_num guards, optional step rollback if non-finite
-📈 Evaluation
-Main metric: top-1 accuracy on the held-out test split defined by each dataset.
-Diagnostics we log:
-Routing entropy and vertex probabilities
-ROSE magnitudes
-Confusion matrices (per epoch and “best”)
-λ (geometry ↔ attention gate) over epochs
-Full loss decomposition
-🔭 Potential for growth
-Hypercube Constellations (shipped classes in the notebook): scale from 4-simplex to n-cube graphs; compare geometry families.
-Multi-resolution (56→128→256 latent; 28→64→128 images); add pyramid encoders.
-Self-distillation / semi-supervised: use ROSE as a confidence-weighted pseudo-labeling signal.
-Better routing: learned vertex priors per class, entropy regularization, temperature schedules.
-Calibration & OOD: temperature scaling / Dirichlet heads; exploit ROSE magnitude and gating entropy for improved uncertainty estimates.
-Deployment adapters: ONNX / TorchScript exports; small mobile variants of PentaFreq.
-⚖️ Ethical considerations & implications
-Clinical datasets (MedMNIST) are simplified proxies; they don’t reflect clinical complexity or demographic coverage.
-Downstream use must include dataset-appropriate validation and calibration; this model is for research only.
-Data bias and label noise can be amplified by strong geometry priors—review confusion matrices and per-class accuracies before claiming improvements.
-Positive implications: the constellation design offers a transparent, analyzable structure (per-vertex heads, explicit geometry), easing interpretability and ablation.
-🔁 Reproducibility
-config.json contains all hyperparameters used for each run.
-manifest.json logs environment: Python, Torch, CUDA GPU, RAM, parameter counts.
-Seeds and determinism flags are printed in logs and set in code.
-history.csv + TensorBoard fully specify the learning trajectory.
-🧾 License
-Apache License 2.0 – see LICENSE.
-📣 Citation
 If you use this work, please cite:
 @software{abstractphil_pentachora_2025,
   author  = {AbstractPhil and Mirel},
   title   = {Pentachora Adaptive Encoded: Geometry-Regularized Classification with PentaFreq},
   year    = {2025},
   license = {Apache-2.0},
-  url     = {https://huggingface.co/AbstractPhil/<repo>}
 }
-🛠️ Changelog (excerpt)
-2025-08: Flagship notebook stabilized (stable losses, eigval CM proxy, NaN rollback, deterministic sweep).
-2025-08: Multi-channel PentaFreq; per-dataset HF folders with full artifacts; optional best/ alias.
-2025-08: Hypercube constellation classes added for follow-up experiments.
-💬 Contact
-Author: @AbstractPhil
-Quartermaster: Mirel (ChatGPT – GPT-5 Thinking)
-Issues / questions: open a Discussion on the HF repo or ping the author
-Notes for reviewers: Every dataset folder contains a complete artifact bundle. Start with manifest.json and history.csv; plots and TensorBoard give the quickest intuition of convergence and geometry behavior.

+# Pentachora Adaptive Encoded (Multi-Channel)
+**A geometry-regularized classifier with a 5-frequency encoder and pentachoron constellation heads.**
+*Author:* **AbstractPhil** · *Quartermaster:* **Mirel** · GPT 4o - GPT 5 - GPT 5 Fast - GPT 5 Thinking - GPT 5 Pro
+*Assistants:* Claude Opus 4.1 - Claude Sonnet 4 - Gemini 2.5
+*License:* **Apache-2.0**
+---
+## 📌 TL;DR
+This repository hosts training runs of a **frequency-aware encoder** (PentaFreq) paired with a **pentachoron constellation classifier** (dispatchers + specialists). The model blends classic cross-entropy with **two contrastive objectives** (dual InfoNCE and **ROSE-weighted** InfoNCE) and a **geometric regularizer** that keeps the learned vertex geometry sane.
+It supports **1-channel and 3-channel** 28×28 inputs (e.g., TorchVision MNIST variants and MedMNIST 2D sets), is **seeded/deterministic**, and ships full artifacts (weights, plots, history, TensorBoard) for review.
+---
+## 🧠 Model overview
+### Architecture
+- **PentaFreq Encoder (multi-channel)**
+  - 5 spectral branches (ultra-high, high, mid, low-mid, low) → per-branch encoders → cross-attention → MLP fusion → **normalized latent `z`**.
+  - Channel-aware: supports **C ∈ {1,3}**; input is flattened to `C×28×28`.
+- **Pentachoron Constellation Classifier**
+  - **Two stacks** (dispatchers & specialists) each containing **pentachora** (5-vertex simplices) with learnable vertices.
+  - **Coherence gate** modulates vertex logits; **group heads** (one per vertex) score class subsets; **pair aggregation** + fusion MLP produce final logits.
+  - Geometry terms encourage valid simplex structure and separation between the two stacks.
+### Objective
+- **CE** – main cross-entropy on logits.
+- **Dual InfoNCE (stable)** – encourages `z` to match the **correct vertex** across both stacks.
+- **ROSE-weighted InfoNCE (stable)** – same idea, but reweights samples by an analytic **ROSE** similarity (triadic cosine + magnitude).
+- **Geometry Regularization** – stable Cayley–Menger **proxy** (eigval-based), edge-variance, center separation, and a **soft radius control**; ramped in early epochs.
+> All contrastive losses use `log_softmax` + `gather` to avoid `inf−inf` traps; all paths **nan-sanitize** defensively.
+### Determinism
+- Global seeding (Python/NumPy/Torch), deterministic DataLoader workers, generator-seeded samplers; cuDNN deterministic & TF32 off.
+- Optional strict mode (`torch.use_deterministic_algorithms(True)`) and deterministic cuBLAS.
+---
+## 🗂️ Repository layout per run
 Each training run uploads a complete bundle at:
+```
 <repo>/<root>/<DatasetName>/<Timestamp_or_best>/
   weights/
     encoder[_<Dataset>].safetensors
   history.json / history.csv
   tensorboard/ (+ zip)
   plots/  # accuracy, loss components, lambda, confusion matrices
+```
+> We also optionally publish a **`best/`** alias inside each dataset folder pointing to the current champion.
+---
+## 🧩 Intended use & use cases
+**Intended use**: research-grade supervised classification and geometry-regularized representation learning on small images (28×28) across gray and color channels.
+**Example use cases**
+- **Benchmarking** on MNIST family / MedMNIST 2D sets with defensible, reproducible training and complete artifacts.
+- **Geometry-aware representation learning**: analyze how simplex vertices move, how the gate allocates probability mass, and how geometry regularization affects generalization.
+- **Class routing / specialization**: per-vertex group heads provide an interpretable split of classes; confusion-driven vertex reweighting helps diagnose hard groups.
+- **Curriculum & loss ablations**: toggle ROSE, dual InfoNCE, or geometry terms to study their marginal value under a controlled seed.
+- **OOD “pressure tests”** (research): ROSE magnitude and routing entropy can be used as quick signals of uncertainty (not calibrated).
+- **Education & reproducibility**: the runs are fully seeded, include TensorBoard logs and plots, and use safe numerical formulations.
+---
+## 🚫 Out-of-scope / limitations
+- **Not a medical device** – even if trained on MedMNIST subsets, this is not a diagnostic tool. Don’t use it for clinical decisions.
+- **Input size** is 28×28; higher-resolution domains require retraining and likely architecture tweaks.
+- **Dataset bias / shift** – performance depends on the underlying distribution. Evaluate before deployment.
+- **Calibration** – logits are not guaranteed calibrated. For decision thresholds, use a validation set or post-hoc calibration.
+- **Robustness** – robustness to adversarial perturbations is not a design goal here.
+---
+## 📈 Example results (single-seed snapshots)
+> Numbers below are indicative from our seeded runs with `img_size=28`, size-aware LR schedule and reg ramp; see `manifest.json` in each run for exact details.
+| Dataset        | C | Best Test Acc | Epoch | Notes                                |
+|----------------|---|---------------:|------:|--------------------------------------|
+| MNIST/Fashion* | 1 | 0.97–0.98      | 15–25 | stable losses + reg ramp             |
+| BloodMNIST     | 3 | ~0.95–0.97+    | 20–30 | color preserved, 28×28                |
+| EMNIST (bal)   | 1 | 0.88–0.92      | 25–45 | many classes; pairs auto-scaled      |
+\* depending on which of the pair (MNIST / FashionMNIST) is selected.
+Consult each dataset folder’s `history.csv` for the full learning curve and the **current best** accuracy.
+---
+## 🔧 How to use (PyTorch)
+```python
 import torch
 from safetensors.torch import load_file as load_safetensors
     logits, diag_out = constellation(z)    # [B, C]
     pred = logits.argmax(dim=1)
 print(pred)
+```
+> To reproduce training, see `config.json` and `history.csv`; all recipes are encoded in the flagship notebook used for these runs.
+---
+## 🔬 Training procedure (default)
+- **Optimizer**: AdamW (β1=0.9, β2=0.999), size-aware LR (≈2e-2 by default)
+- **Schedule**: 10% **warmup** → cosine to `lr_min=1e-6`
+- **Batch size**: up to 2048 (fits on T4/A100 at 28×28)
+- **Loss**: CE + Dual InfoNCE + ROSE InfoNCE + Geometry Reg (ramped) + Diag MSE
+- **Determinism**: seeds for Python/NumPy/Torch (CPU/GPU), deterministic DataLoader workers and samplers, cuDNN deterministic, TF32 off
+- **Numerical safety**: log-softmax contrastive, eigval CM proxy, `nan_to_num` guards, optional step rollback if non-finite
+---
+## 📈 Evaluation
+- Main metric: **top-1 accuracy** on the held-out test split defined by each dataset.
+- Diagnostics we log:
+  - **Routing entropy** and vertex probabilities
+  - **ROSE** magnitudes
+  - Confusion matrices (per epoch and “best”)
+  - λ (geometry ↔ attention gate) over epochs
+  - Full loss decomposition
+---
+## 🔭 Potential for growth
+- **Hypercube Constellations** (shipped classes in the notebook): scale from 4-simplex to n-cube graphs; compare geometry families.
+- **Multi-resolution** (56→128→256 latent; 28→64→128 images); add pyramid encoders.
+- **Self-distillation / semi-supervised**: use ROSE as a confidence-weighted pseudo-labeling signal.
+- **Better routing**: learned vertex priors per class, entropy regularization, temperature schedules.
+- **Calibration & OOD**: temperature scaling / Dirichlet heads; exploit ROSE magnitude and gating entropy for improved uncertainty estimates.
+- **Deployment adapters**: ONNX / TorchScript exports; small mobile variants of PentaFreq.
+---
+## ⚖️ Ethical considerations & implications
+- **Clinical datasets** (MedMNIST) are simplified proxies; they don’t reflect clinical complexity or demographic coverage.
+- **Downstream use** must include dataset-appropriate validation and calibration; this model is for **research** only.
+- **Data bias** and **label noise** can be amplified by strong geometry priors—review confusion matrices and per-class accuracies before claiming improvements.
+- **Positive implications**: the constellation design offers a **transparent, analyzable structure** (per-vertex heads, explicit geometry), easing **interpretability** and **ablation**.
+---
+## 🔁 Reproducibility
+- `config.json` contains all hyperparameters used for each run.
+- `manifest.json` logs environment: Python, Torch, CUDA GPU, RAM, parameter counts.
+- Seeds and determinism flags are printed in logs and set in code.
+- `history.csv` + TensorBoard fully specify the learning trajectory.
+---
+## 🧾 License
+**Apache License 2.0** – see `LICENSE`.
+---
+## 📣 Citation
 If you use this work, please cite:
+```
 @software{abstractphil_pentachora_2025,
   author  = {AbstractPhil and Mirel},
   title   = {Pentachora Adaptive Encoded: Geometry-Regularized Classification with PentaFreq},
   year    = {2025},
   license = {Apache-2.0},
+  url     = {https://huggingface.co/AbstractPhil/pentachora-multi-channel-frequency-encoded}
 }
+```
+---
+## 🛠️ Changelog (excerpt)
+- **2025-08**: Flagship notebook stabilized (stable losses, eigval CM proxy, NaN rollback, deterministic sweep).
+- **2025-08**: Multi-channel PentaFreq; per-dataset HF folders with full artifacts; optional `best/` alias.
+- **2025-08**: Hypercube constellation classes added for follow-up experiments.
+---
+## 💬 Contact
+- **Author:** @AbstractPhil
+- **Quartermaster:** Mirel (ChatGPT – GPT-5 Thinking)
+- **Issues / questions:** open a Discussion on the HF repo or ping the author