Two questions about RNA secondary structure prediction task

#1
by chiyazzz - opened

First thanks for the very excellent work!
I have a question about RNA secondary structure prediction task. There is an inconsistency between performance in Supplementary Table 1 and in paper context. For example, PlantRNA-FM performance on ArchiveII in Supplementary Table 1 is 0.855, but in paper is 0.924. The data in Source Data Extended Data Fig. 1 is also in inconsistent with Supplementary Table 1 and in paper context. Which one is correct?
Another question is that, what does PlantRNA-FM-RNA-Only mean? I cannot find an explanation in the paper.

also, it seems that the model in this huggingface repository does not have the mlm, secondary structure, and annotation prediction head.

Both of the scores are valid. The lower score means we removed similar structures. We only released the MLM model as shown in the example.

from transformers import AutoModelForMaskedLM, AutoTokenizer

model_name_or_path = "yangheng/PlantRNA-FM"

model = AutoModelForMaskedLM.from_pretrained(model_name_or_path)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

Thanks! And what does PlantRNA-FM-RNA-Only mean?

It refers to the model without structure pretraining

yangheng changed discussion status to closed

Sign up or log in to comment