| language: en | |
| tags: | |
| - bert | |
| - regression | |
| - biencoder | |
| - similarity | |
| pipeline_tag: text-similarity | |
| # BiEncoder Regression Model | |
| This model is a BiEncoder architecture that outputs similarity scores between text pairs. | |
| ## Model Details | |
| - Base Model: bert-base-uncased | |
| - Task: Regression | |
| - Architecture: BiEncoder with cosine similarity | |
| - Loss Function: contrastive | |
| ## Usage | |
| ```python | |
| from transformers import AutoTokenizer, AutoModel | |
| from modeling import BiEncoderModelRegression | |
| # Load model components | |
| tokenizer = AutoTokenizer.from_pretrained("minoosh/bert-reg-biencoder-contrastive") | |
| base_model = AutoModel.from_pretrained("bert-base-uncased") | |
| model = BiEncoderModelRegression(base_model, loss_fn="contrastive") | |
| # Load weights | |
| state_dict = torch.load("pytorch_model.bin") | |
| model.load_state_dict(state_dict) | |
| # Prepare inputs | |
| texts1 = ["first text"] | |
| texts2 = ["second text"] | |
| inputs = tokenizer( | |
| texts1, texts2, | |
| padding=True, | |
| truncation=True, | |
| return_tensors="pt" | |
| ) | |
| # Get similarity scores | |
| outputs = model(**inputs) | |
| similarity_scores = outputs["logits"] | |
| ``` | |
| ## Metrics | |
| The model was trained using contrastive loss and evaluated using: | |
| - Mean Squared Error (MSE) | |
| - Mean Absolute Error (MAE) | |
| - Pearson Correlation | |
| - Spearman Correlation | |
| - Cosine Similarity | |