Discrepancies in Model Output Precision and Warning Messages

#6
by razsoriginals - opened

I am currently working with this text encoding model, and I’ve encountered some discrepancies in the output when comparing it with the official code provided for the same model. Here are the specifics:

  1. Warning Messages:

    • When loading the model checkpoint, I received warnings about unused weights:
      Some weights of the model checkpoint at ../model/stella_en_400M_v5 were not used when initializing NewModel: ['new.pooler.dense.bias', 'new.pooler.dense.weight'] 
      This IS expected if you are initializing NewModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
      This IS NOT expected if you are initializing NewModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
      
      This indicates that some weights from the checkpoint are not used due to differences in the model architecture or other factors. While these warnings are expected and can be ignored if the weights are irrelevant, I want to ensure that ignoring them does not affect the model's performance.
  2. Output Discrepancies:

    • The output similarity scores produced by the model are slightly different from those in the official example:
      • Official Output:
        [[0.8397531  0.29900077]
         [0.32818374 0.80954516]]
        
      • My Output:
        [[0.83975303 0.29900068]
         [0.32818383 0.8095452 ]]
        

      The differences in the numerical results seem minor, but they may be indicative of differences in model implementation, precision, or data handling. Ensuring consistency in these outputs is crucial for reliable performance in RAG with LLM.

I would like to discuss the following:

  • Impact of Warning Messages: How critical are these warnings? Should they be a cause for concern, or can they be safely ignored in the context of RAG with LLM?
  • Precision Differences: What might be causing these small discrepancies in the output? Could they be due to differences in model implementation, floating-point precision, or other factors?

Understanding these aspects is crucial to ensuring that the model's performance and results align with the expectations and standards set by the official code. Any insights or recommendations on how to address these issues would be greatly appreciated, especially in the context of integrating this model with RAG and LLM.

StellaEncoder org

Hi, thank you for your suggestion!

  1. Warning Messages
    This model does not need the pooler weight, we only use the last_hidden_statesand add a Linear after it. So, just ignore this warning.

  2. Output Discrepancies
    The differences in the numerical results are minor, you can ignoore this. This is not my area of expertise, I can only surmise that it might have something to do with software,hardware, batch_size or max_length.
    Coule you please provide the codes which can reproduce your output.

Besides, I will change the official example to make it easyier to reproduce.

Sign up or log in to comment