Update README.md
Browse files
README.md
CHANGED
@@ -29,17 +29,33 @@ Install [Model Generator](https://github.com/genbio-ai/modelgenerator).
|
|
29 |
|
30 |
#### Outputs:
|
31 |
- The evaluation score will be printed on the console.
|
32 |
-
- The generated sequences will be stored
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
43 |
|
44 |
|
45 |
#### Note:
|
|
|
29 |
|
30 |
#### Outputs:
|
31 |
- The evaluation score will be printed on the console.
|
32 |
+
- The generated sequences will be stored the folder `proteinIF_outputs/`. There will be two output files:
|
33 |
+
- **`./proteinIF_outputs/designed_sequences.pkl`**: This file will contain the raw token (amino-acid) IDs of the ground truth sequences (`"true_seq"`) and predicted sequences by our method (`"pred_seq"`), stored as numpy arrays. An example:
|
34 |
+
```
|
35 |
+
{
|
36 |
+
'true_seq': [
|
37 |
+
array([[ 4, 8, 4, 3, 12, 5, 2, 11, 16, 15, 5, 1, 11, ...]]), ...
|
38 |
+
],
|
39 |
+
'pred_seq': [
|
40 |
+
array([[ 8, 2, 4, 3, 10, 6, 2, 11, 16, 15, 6, 1, 11, ...]]), ...
|
41 |
+
]
|
42 |
+
}
|
43 |
+
```
|
44 |
+
- **`./proteinIF_outputs/results_acc_<median_accuracy>.txt`** (where median accuracy is the median accuracy calculated over all the test samples):
|
45 |
+
- Here, for each protein in the test set, we have three lines of information:
|
46 |
+
- Line1: Identity of the protein (as '`name=<PDB_ID>.<CHAIN_ID>`'), length of the squence (as '`L=<length_of_sequence>`'), and the recovery rate/accuracy for that protein sequence (as '`Recovery=<recovery_rate_of_sequence>`')
|
47 |
+
- Line2: *Single-letter representation* of amino-acids of the ground truth sequences (as `true:VTVGKSAPYFSL...`)
|
48 |
+
- Line3: *Single-letter representation* of amino-acids of the predicted sequences by our method (as `true:TAVGDEAPYFEL...`)
|
49 |
+
- An example file content:
|
50 |
+
```
|
51 |
+
>name=3fkf.A | L=141 | Recovery=0.5957446694374084
|
52 |
+
true:VTVGKSAPYFSLPNEKGEKLSRSAERFRNRYLLLNFWASWCDPQPEANAELKRLNKEYKKNKNFAMLGISLDIDREAWETAIKKDTLSWDQVCDFTGLSSETAKQYAILTLPTNILLSPTGKILARDIQGEALTGKLKELL
|
53 |
+
pred:TAVGDEAPYFELPDLEGKKLSLDSEEFKNKYLLLDFWASWCLPCREEIAELKELYRRFAKNKKFAILGVSADTDKEAWLKAVKEDNLRWTQVSDFKGWDSEVFKNYNVQSLPENILLSPEGKILARGIRGEALRNKLKELL
|
54 |
+
|
55 |
+
>name=2d9e.A | L=121 | Recovery=0.7685950398445129
|
56 |
+
true:GSSGSSGFLILLRKTLEQLQEKDTGNIFSEPVPLSEVPDYLDHIKKPMDFFTMKQNLEAYRYLNFDDFEEDFNLIVSNCLKYNAKDTIFYRAAVRLREQGGAVLRQARRQAEKMGSGPSSG
|
57 |
+
pred:GSSGSSGRLTLLRETLEQLQERDTGWVFSEPVPLSEVPDYLDVIDHPMDFSTMRRKLEAHRYLSFDEFERDFNLIVENCRKYNAKDTVFYRAAVRLQAQGGAILRKARRDVESLGSGPSSG
|
58 |
+
```
|
59 |
|
60 |
|
61 |
#### Note:
|