https://github.com/wassemgtk/SuperTokenizer
Waseem AlShikh
AI & ML interests
Recent Activity
Organizations
wassemgtk's activity

https://github.com/wassemgtk/SuperTokenizer

AMAZING WORK - Based the updated model snippet and results, I’ll provide new and additional suggestions to further refine AdaptiveGESAL, targeting an RMSE of 10–16 cycles while maintaining efficiency and scalability.
The Accuracy (±50 cycles) => 100.0% is excellent, indicating robust generalization within the ±50 cycle tolerance, but RMSE/MAE show room for precision improvement.
The temporal layers (Conv1d, LSTM) are working well, but i belive having deeper or more specialized layers could capture finer degradation patterns.
Include parallel Conv1d layers with different kernel sizes (e.g., 3, 5, 7) to capture short- and long-term trends, then concatenate outputs before the LSTM:
self.conv1d_short = nn.Conv1d(input_dim, hidden_dim // 3, kernel_size=3, padding=1)
self.conv1d_med = nn.Conv1d(input_dim, hidden_dim // 3, kernel_size=5, padding=2)
self.conv1d_long = nn.Conv1d(input_dim, hidden_dim // 3, kernel_size=7, padding=3)
def forward(self, x):
x = x.unsqueeze(1) # (batch, 1, features)
short = self.activation(self.conv1d_short(x))
med = self.activation(self.conv1d_med(x))
long = self.activation(self.conv1d_long(x))
x = torch.cat([short, med, long], dim=2).squeeze(1)
x, _ = self.lstm(x)
# Continue with SVF and output layers
ANd improves temporal context, reducing MAE.
self.lstm = nn.LSTM(hidden_dim, hidden_dim // 2, batch_first=True, bidirectional=True, num_layers=1)
x, _ = self.lstm(x) # Output shape: (batch, seq_len, hidden_dim)
x = x.squeeze(1) * 2 # Scale to match original hidden_dim
Then increases model capacity for complex patterns while maintaining efficiency via SVF like the below
original_fc1 = nn.Linear(256, 128)
original_fc2 = nn.Linear(128, 64)
original_fc3 = nn.Linear(64, 32)
self.svf1 = SVFLinear(original_fc1, dropout_rate=0.2, l2_lambda=0.01)
self.svf2 = SVFLinear(original_fc2, dropout_rate=0.2, l2_lambda=0.01)
self.svf3 = SVFLinear(original_fc3, dropout_rate=0.2, l2_lambda=0.01)
self.output_layer = nn.Linear(32, 1)


One more idea add temporal layers
Integrate 1D convolutional layers or LSTM layers before the SVFLinear layers to capture temporal dependencies in the sensor data over cycles. something like;
class AdaptiveGESAL(nn.Module):
def __init__(self, input_dim=21, hidden_dim=128, num_nodes=50):
super().__init__()
self.conv1d = nn.Conv1d(input_dim, hidden_dim, kernel_size=3, padding=1)
self.lstm = nn.LSTM(hidden_dim, hidden_dim, batch_first=True)
self.svf_layers = [SVFLinear(nn.Linear(hidden_dim, hidden_dim)) for _ in range(2)]
self.output_layer = nn.Linear(hidden_dim, 1) # RUL prediction
# Graph and SVF initialization as before
With replace MSE (implicit in RMSE) with a hybrid loss combining MSE and a quantile loss (e.g., 0.9 quantile for conservative RUL estimates) this penalizes underestimation, aligning with conservative RUL needs.

Few more suggestions:
Use correlation analysis or techniques like Principal Component Analysis (PCA) to identify the most predictive features (e.g., vibration, temperature, pressure) and reduce noise from less relevant sensors.
Transform features into time-series statistics (e.g., rolling averages, standard deviations, or slopes over cycles) to capture degradation trends. For example, compute a 10-cycle rolling mean for T30 (total temperature at LPC outlet) and Nf (physical fan speed).
Normalize or standardize features (e.g., z-scores or min-max scaling) per engine to account for individual variability, ensuring AdaptiveGESAL’s embeddings better distinguish degradation states.


The accuracy drop likely ties to prompt inconsistency—standardization is key. and if your setup can handle more nodes and data; focus on tuning distance_threshold.
Try these tweaks:
Prompt: "Given Engine X with [params], predict ‘replace’, ‘maintenance’, or ‘check’ based on wear."
Hyperparameters: temperature=0.4, top_k=20, distance_threshold=0.25, lr=0.005, buffer_size=10.
Scale: Batch 500 engines, aiming for 10–15 nodes.

@oieieio This is awesome! What is your primary feedback on how I can improve it? I haven't had a chance to run it on a larger evaluation yet.

We’re excited to unveil **Graph-Enhanced Singular Adaptive Learning (GESAL)**, a framework that lets LLMs like
meta-llama/Llama-3.2-1B
adapt in real time using user feedback. Check out the code and white paper on GitHub!🔗 **Code**: [https://github.com/writer/AI-Adaptive-Learning-GESAL](https://github.com/writer/AI-Adaptive-Learning-GESAL)
---
## Why GESAL?
Static LLMs struggle to adapt without heavy retraining. GESAL solves this with:
- **SVF**: Adapts weights via \( W' = U (\Sigma \cdot z) V^T \), using few parameters.
- **Graph Memory**: Stores adaptations in nodes for scalability.
- **RL**: Updates via \( J(z) = \mathbb{E}[\log \pi_z(y|x) r] \) based on feedback.
---
## How It Works
Ask "How many R’s in ‘strawberry’?" If it says "2" and you say "no," GESAL learns to say "3" next time, avoiding repeats.
---
## Try It
Built with Hugging Face’s
transformers
:pip install transformers torch numpy
python Adaptive_Learning_(GESAL).py
Needs a Hugging Face token for Llama-3.2-1B.
---
## Results
GESAL hits 95% accuracy after 5 feedbacks vs. LoRA’s 70%. It’s efficient (~0.5M params) and scalable.

TL;DR
Palmyra-Med-70b
🔢 8k and 32k versions available
🚀 MMLU performance of ~86%, outperforming other top models
👨⚕️ Great for diagnosing, planning treatments, medical research, insurance coding and billing
📃 Open-model license for non-commercial use cases
🤗 Available on Hugging Face: Writer/Palmyra-Med-70B
💾 Live on NVIDIA NIM: https://build.nvidia.com/writer/palmyra-med-70b
Palmyra-Fin-70b
🚀 Passed the CFA Level III exam with a 73% score — the first model to do so
💸 Skilled at complex tasks like investment research, financial analysis, and sentiment analysis
📈 Outperformed other top models on a long-fin-eval test of real-world use cases
📃 Open-model license for non-commercial use cases
🤗 Available on Hugging Face: https://huggingface.co/Writer/Palmyra-Fin-70B-32K
💾 Live on NVIDIA NIM: https://build.nvidia.com/writer/palmyra-fin-70b-32k
Try them out and let us know what you think!

| ---------------------------- |
| #mmlu 77.26 |
| ---------------------------- |
| #hellaswag 88.81 |
| ---------------------------- |
| #truthfulqa 52.05 |
| ---------------------------- |
| #arc_challenge 70.31 |
| ---------------------------- |
| #winogrande 84.93 |
| ---------------------------- |
| #gsm8k 76.65 |
| ---------------------------- |

Check it out ➡️ [OmniACT Dataset on Hugging Face]( Writer/omniact)
For a deep dive, here’s the paper: [OmniACT Paper](https://arxiv.org/abs/2402.17553)