File size: 8,060 Bytes
d255a0f
 
 
ba53531
d255a0f
40e08df
d255a0f
 
 
 
e484a46
 
 
d3a39d7
e484a46
d3a39d7
e484a46
 
 
 
 
 
d3a39d7
e484a46
 
 
d3a39d7
 
 
 
 
 
e484a46
 
 
 
 
 
 
 
d3a39d7
e484a46
d3a39d7
e484a46
 
 
 
 
 
cbc0b57
 
e484a46
 
 
 
 
 
 
d3a39d7
cbc0b57
e484a46
 
 
 
 
 
 
d3a39d7
 
e484a46
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e9c7c84
 
49e6e6e
e484a46
 
 
d3a39d7
e484a46
d3a39d7
e9c7c84
8a41ab5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cbc0b57
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
---

title: AI Polymer Classification
emoji: πŸ”¬
colorFrom: indigo
colorTo: green
sdk: streamlit
app_file: app.py
pinned: false
license: apache-2.0
---


# πŸ”¬ AI-Driven Polymer Aging Prediction and Classification System

A research project developed as part of AIRE 2025. This system applies deep learning to spectral data to classify polymer aging a critical proxy for recyclability using a fully reproducible and modular ML pipeline.

The broader research vision is a multi-modal evaluation platform, benchmarking not only Raman spectra but also image-based models and FTIR spectral data, ensuring reproducibility, extensibility, and scientific rigor.

---

## 🎯 Project Objective

- Build a validated machine learning system for classifying polymer spectra (predict degradation levels as a proxy for recyclability)
- Evaluate and compare multiple CNN architectures, beginning with Figure2CNN and ResNet variants, and expand to additional trained models.
- Ensure scientific reproducibility through structured diaignostics and artifact control
- Support sustainability and circular materials research through spectrum-based classification.

  **Reference (for Figure2CNN baseline):**

  > Neo, E.R.K., Low, J.S.C., Goodship, V., Debattista, K. (2023).
  > Deep learning for chemometric analysis of plastic spectral data from infrared and Raman databases.
  > Resources, Conservation & Recycling, 188, 106718.
  > https://doi.org/10.1016/j.resconrec.2022.106718
---

## 🧠 Model Architectures

| Model| Description |
|------|-------------|
| `Figure2CNN`  | Baseline model from literature |
| `ResNet1D`    | Deeper candidate model with skip connections |
| `ResNet18Vision` | Image-focused CNN architecture, retrained on polymer dataset (roadmap) |

  Future expansions will add additional trained CNNs, supporting direct benchmarking and comparative reporting.

---

## πŸ“ Project Structure (Cleaned and Current)

```text

ml-polymer-recycling/

β”œβ”€β”€ datasets/

β”œβ”€β”€ models/           # Model architectures

β”œβ”€β”€ scripts/          # Training, inference, utilities

β”œβ”€β”€ outputs/          # Artifacts: models, logs, plots

β”œβ”€β”€ docs/             # Documentation & reports

└── environment.yml   # (local) Conda execution environment

```

![ml-polymer-gitdiagram-0](https://github.com/user-attachments/assets/bb5d93dc-7ab9-4259-8513-fb680ae59d64)

---

## βœ… Current Status

| Track     | Status               | Test Accuracy |
|-----------|----------------------|----------------|
| **Raman** | βœ… Active & validated  | **87.81% Β± 7.59%** |
| **Image**  | 🚧 Planned Expansion | N/A |
| **FTIR**  | ⏸️ Deferred/Modularized | N/A |

## πŸ”¬ Key Features

- βœ… 10-Fold Stratified Cross-Validation
- βœ… CLI Training: `train_model.py`
- βœ… CLI Inference `run_inference.py`
- βœ… Output artifact naming per model
- βœ… Raman-only preprocessing with baseline correction, smoothing, normalization
- βœ… Structured diagnostics JSON (accuracies, confusion matrices)
- βœ… Canonical validation script (`validate_pipeline.sh`) confirms reproducibility of all core components

---

**Environments:**

```bash

# Local

git checkout main

conda env create -f environment.yml

conda activate polymer_env



# HPC

git checkout hpc-main

conda env create -f environment_hpc.yml

conda activate polymer_env

```

## πŸ“Š Sample Training & Inference

### Training (10-Fold CV)

```bash

python scripts/train_model.py --model resnet --target-len 4000 --baseline --smooth --normalize

```

### Inference (Raman)

```bash

python scripts/run_inference.py --target-len 4000 

--input datasets/rdwp/sample123.txt --model outputs/resnet_model.pth 

--output outputs/inference/prediction.txt

```

### Inference Output Example:

```bash

Predicted Label: 1 True Label: 1

Raw Logits: [[-569.544, 427.996]]

```

### Validation Script (Raman Pipeline)

```bash

./validate_pipeline.sh

# Runs preprocessing, training, inference, and plotting checks

# Confirms artifact integrity and logs test results

```

---

## πŸ“š Dataset Resources

| Type  | Dataset | Source |
|-------|---------|--------|
| Raman | RDWP    | [A Raman database of microplastics weathered under natural environments](https://data.mendeley.com/datasets/kpygrf9fg6/1) |

| Datasets should be downloaded separately and placed here:

```bash

datasets/

└── rdwp/

  β”œβ”€β”€ sample1.txt

  β”œβ”€β”€ sample2.txt

  └── ...

```

These files are intentionally excluded from version control via `.gitignore`

---

## πŸ›  Dependencies

- `Python 3.10+`
- `Conda, Git`
- `PyTorch (CPU & CUDA)`
- `Numpy, SciPy, Pandas`
- `Scikit-learn`
- `Matplotlib, Seaborn`
- `ArgParse, JSON`

---

## πŸ§‘β€πŸ€β€πŸ§‘ Contributors

- **Dr. Sanmukh Kuppannagari** β€” Research Mentor
- **Dr. Metin Karailyan** β€” Research Mentor
- **Jaser H.** β€” AIRE 2025 Intern, Developer  

---

## 🎯 Strategic Expansion Objectives (Roadmap)

> The roadmap defines three major expansion paths designed to broaden the system’s capabilities and impact:

1. **Model Expansion: Multi-Model Dashboard**

    > The dashboard will evolve into a hub for multiple model architectures rather than being tied to a single baseline. Planned work includes:


   - **Retraining & Fine-Tuning**: Incorporating publicly available vision models and retraining them with the polymer dataset.
   - **Model Registry**: Automatically detecting available .pth weights and exposing them in the dashboard for easy selection.
   - **Side-by-Side Reporting**: Running comparative experiments and reporting each model’s accuracy and diagnostics in a standardized format.
   - **Reproducible Integration**: Maintaining modular scripts and pipelines so each model’s results can be replicated without conflict.

   This ensures flexibility for future research and transparency in performance comparisons.

2. **Image Input Modality**

    > The system will support classification on images as an additional modality, extending beyond spectra. Key features will include:


   - **Upload Support**: Users can upload single images or batches directly through the dashboard.
   - **Multi-Model Execution**: Selected models from the registry can be applied to all uploaded images simultaneously.
   - **Batch Results**: Output will be returned in a structured, accessible way, showing both individual predictions and aggregate statistics.
   - **Enhanced Feedback**: Outputs will include predicted class, model confidence, and potentially annotated image previews.

   This expands the system toward a multi-modal framework, supporting broader research workflows.

3. **FTIR Dataset Integration**

    > Although previously deferred, FTIR support will be added back in a modular, distinct fashion. Planned steps are:


    - **Dedicated Preprocessing**: Tailored scripts to handle FTIR-specific signal characteristics (multi-layer handling, baseline correction, normalization).
    - **Architecture Compatibility**: Ensuring existing and retrained models can process FTIR data without mixing it with Raman workflows.
    - **UI Integration**: Introducing FTIR as a separate option in the modality selector, keeping Raman, Image, and FTIR workflows clearly delineated.
    - **Phased Development**: Implementation details to be refined during meetings to ensure scientific rigor.

    This guarantees FTIR becomes a supported modality without undermining the validated Raman foundation.


## πŸ”‘ Guiding Principles

- **Preserve the Raman baseline** as the reproducible ground truth
- **Additive modularity**: Models, images, and FTIR added as clean, distinct layers rather than overwriting core functionality
- **Transparency & reproducibility**: All expansions documented, tested, and logged with clear outputs.
- **Future-oriented design**: Workflows structured to support ongoing collaboration and successor-safe research.