| As soon as inf or | |
| nan is detected in at least one element of the activations or weights, the program will assert and print a report | |
| like this (this was caught with google/mt5-small under fp16 mixed precision): | |
| Detected inf/nan during batch_number=0 | |
| Last 21 forward frames: | |
| abs min abs max metadata | |
| encoder.block.1.layer.1.DenseReluDense.dropout Dropout | |
| 0.00e+00 2.57e+02 input[0] | |
| 0.00e+00 2.85e+02 output | |
| [] | |
| encoder.block.2.layer.0 T5LayerSelfAttention | |
| 6.78e-04 3.15e+03 input[0] | |
| 2.65e-04 3.42e+03 output[0] | |
| None output[1] | |
| 2.25e-01 1.00e+04 output[2] | |
| encoder.block.2.layer.1.layer_norm T5LayerNorm | |
| 8.69e-02 4.18e-01 weight | |
| 2.65e-04 3.42e+03 input[0] | |
| 1.79e-06 4.65e+00 output | |
| encoder.block.2.layer.1.DenseReluDense.wi_0 Linear | |
| 2.17e-07 4.50e+00 weight | |
| 1.79e-06 4.65e+00 input[0] | |
| 2.68e-06 3.70e+01 output | |
| encoder.block.2.layer.1.DenseReluDense.wi_1 Linear | |
| 8.08e-07 2.66e+01 weight | |
| 1.79e-06 4.65e+00 input[0] | |
| 1.27e-04 2.37e+02 output | |
| encoder.block.2.layer.1.DenseReluDense.dropout Dropout | |
| 0.00e+00 8.76e+03 input[0] | |
| 0.00e+00 9.74e+03 output | |
| encoder.block.2.layer.1.DenseReluDense.wo Linear | |
| 1.01e-06 6.44e+00 weight | |
| 0.00e+00 9.74e+03 input[0] | |
| 3.18e-04 6.27e+04 output | |
| encoder.block.2.layer.1.DenseReluDense T5DenseGatedGeluDense | |
| 1.79e-06 4.65e+00 input[0] | |
| 3.18e-04 6.27e+04 output | |
| encoder.block.2.layer.1.dropout Dropout | |
| 3.18e-04 6.27e+04 input[0] | |
| 0.00e+00 inf output | |
| The example output has been trimmed in the middle for brevity. | |
| The second column shows the value of the absolute largest element, so if you have a closer look at the last few frames, | |
| the inputs and outputs were in the range of 1e4. |