smahbub commited on
Commit
577347b
·
verified ·
1 Parent(s): bddf0f4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -11
README.md CHANGED
@@ -29,17 +29,33 @@ Install [Model Generator](https://github.com/genbio-ai/modelgenerator).
29
 
30
  #### Outputs:
31
  - The evaluation score will be printed on the console.
32
- - The generated sequences will be stored in `./proteinIF_outputs/designed_sequences.pkl`. The content of this file looks as follows, where we have the token (amino-acid) ids of the ground truth sequences (`"true_seq"`) and predicted sequences by our method (`"pred_seq"`), stored as numpy arrays.
33
- ```
34
- {
35
- 'true_seq': [
36
- array([[ 4, 8, 4, 3, 12, 5, 2, 11, 16, 15, 5, 1, 11, ...]]), ...
37
- ],
38
- 'pred_seq': [
39
- array([[ 8, 2, 4, 3, 10, 6, 2, 11, 16, 15, 6, 1, 11, ...]]), ...
40
- ]
41
- }
42
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
 
44
 
45
  #### Note:
 
29
 
30
  #### Outputs:
31
  - The evaluation score will be printed on the console.
32
+ - The generated sequences will be stored the folder `proteinIF_outputs/`. There will be two output files:
33
+ - **`./proteinIF_outputs/designed_sequences.pkl`**: This file will contain the raw token (amino-acid) IDs of the ground truth sequences (`"true_seq"`) and predicted sequences by our method (`"pred_seq"`), stored as numpy arrays. An example:
34
+ ```
35
+ {
36
+ 'true_seq': [
37
+ array([[ 4, 8, 4, 3, 12, 5, 2, 11, 16, 15, 5, 1, 11, ...]]), ...
38
+ ],
39
+ 'pred_seq': [
40
+ array([[ 8, 2, 4, 3, 10, 6, 2, 11, 16, 15, 6, 1, 11, ...]]), ...
41
+ ]
42
+ }
43
+ ```
44
+ - **`./proteinIF_outputs/results_acc_<median_accuracy>.txt`** (where median accuracy is the median accuracy calculated over all the test samples):
45
+ - Here, for each protein in the test set, we have three lines of information:
46
+ - Line1: Identity of the protein (as '`name=<PDB_ID>.<CHAIN_ID>`'), length of the squence (as '`L=<length_of_sequence>`'), and the recovery rate/accuracy for that protein sequence (as '`Recovery=<recovery_rate_of_sequence>`')
47
+ - Line2: *Single-letter representation* of amino-acids of the ground truth sequences (as `true:VTVGKSAPYFSL...`)
48
+ - Line3: *Single-letter representation* of amino-acids of the predicted sequences by our method (as `true:TAVGDEAPYFEL...`)
49
+ - An example file content:
50
+ ```
51
+ >name=3fkf.A | L=141 | Recovery=0.5957446694374084
52
+ true:VTVGKSAPYFSLPNEKGEKLSRSAERFRNRYLLLNFWASWCDPQPEANAELKRLNKEYKKNKNFAMLGISLDIDREAWETAIKKDTLSWDQVCDFTGLSSETAKQYAILTLPTNILLSPTGKILARDIQGEALTGKLKELL
53
+ pred:TAVGDEAPYFELPDLEGKKLSLDSEEFKNKYLLLDFWASWCLPCREEIAELKELYRRFAKNKKFAILGVSADTDKEAWLKAVKEDNLRWTQVSDFKGWDSEVFKNYNVQSLPENILLSPEGKILARGIRGEALRNKLKELL
54
+
55
+ >name=2d9e.A | L=121 | Recovery=0.7685950398445129
56
+ true:GSSGSSGFLILLRKTLEQLQEKDTGNIFSEPVPLSEVPDYLDHIKKPMDFFTMKQNLEAYRYLNFDDFEEDFNLIVSNCLKYNAKDTIFYRAAVRLREQGGAVLRQARRQAEKMGSGPSSG
57
+ pred:GSSGSSGRLTLLRETLEQLQERDTGWVFSEPVPLSEVPDYLDVIDHPMDFSTMRRKLEAHRYLSFDEFERDFNLIVENCRKYNAKDTVFYRAAVRLQAQGGAILRKARRDVESLGSGPSSG
58
+ ```
59
 
60
 
61
  #### Note: