|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
tags: |
|
- metric |
|
- scoring |
|
- implicit_langauge |
|
- implicitness |
|
- linguistic |
|
--- |
|
This repo contains a trained metric, **ImpScore**, which is trained on 112,580 sentence pairs using contrastive learning. |
|
It calculates an implicitness score ranging from [0, 2] for an English input sentence. **A higher score indicates greater implicitness**. Additionally, it can calculate the pragmatic distance between two input sentences, with the distance value ranging from [0, ∞). **A higher distance means the two sentences differ more in their intended meaning**. |
|
|
|
The training code for this metric is available on Github: [https://github.com/audreycs/ImpScore](https://github.com/audreycs/ImpScore) |
|
|
|
<br> |
|
|
|
# Download ImpScore: |
|
Since the model is fully customized, you need to download the model file first to use it. |
|
|
|
### Method 1: Dynamic loading |
|
```python |
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
repo_id = "audreyeleven/ImpScore" |
|
|
|
# Download the model python file |
|
model_path = hf_hub_download(repo_id=repo_id, filename="impscore.py") |
|
|
|
# Load the model dynamically |
|
import importlib.util |
|
spec = importlib.util.spec_from_file_location("ModelClass", model_path) |
|
model_module = importlib.util.module_from_spec(spec) |
|
spec.loader.exec_module(model_module) |
|
|
|
model = model_module.ImpModel.from_pretrained(repo_id) |
|
|
|
device = "cuda" # or "cpu" |
|
model.device = device |
|
model.to(device) |
|
|
|
model.eval() |
|
``` |
|
|
|
### Method 2: Local loading |
|
You can also download the model python file into your local and import it. |
|
```python |
|
from impscore import ImpModel |
|
model = ImpModel.from_pretrained(repo_id) |
|
|
|
device = "cuda" # or "cpu" |
|
model.device = device |
|
model.to(device) |
|
|
|
model.eval() |
|
``` |
|
|
|
<br> |
|
|
|
# Use This Metric |
|
### Calculating implicitness score for single sentence |
|
The metric has the `.infer(sentence)` function which takes a single sentence as the input, and returns |
|
- its implicitness score |
|
- implicit embedding |
|
- pragmatic embedding |
|
|
|
```python |
|
# test inference |
|
|
|
imp_score, imp_embedding, prag_embedding = model.infer("I have to leave now. Talk to you later.") |
|
print(imp_score, imp_embedding, prag_embedding) |
|
imp_score, imp_embedding, prag_embedding = model.infer("I can't believe we've talked for so long.") |
|
print(imp_score, imp_embedding, prag_embedding) |
|
``` |
|
The outputs: |
|
``` |
|
tensor(0.6709, device='cuda:0', grad_fn=<RsubBackward1>) tensor([ 0.0458, -0.0149, -0.0182, -0.0905, 0.0541, -0.0133, ...]) |
|
tensor(1.0984, device='cuda:0', grad_fn=<RsubBackward1>) tensor([-0.0086, -0.1357, -0.0067, -0.0513, -0.0225, 0.0664, ...]) |
|
``` |
|
This means the second sentence *"I can't believe we've talked for so long."* is more implicit. |
|
|
|
### Calculating implicitness score and pragmatic distance for sentence pairs |
|
The `.infer_pairs(sent_batch1, sent_batch2)` function takes pairs of sentences as input, and calculates |
|
- their individual implicitness score |
|
- their pragmatic distance |
|
|
|
`sent_batch1` is the list of the first sentence in each pair, and `sent_batch2` is the list of the second sentence in each pair. |
|
|
|
```python |
|
sentence_batch = [ |
|
["I have to leave now. Talk to you later.", "I can't believe we've talked for so long."], |
|
["You must find a new place and move out by the end of this month.", "Maybe exploring other housing options could benefit us both?"] |
|
] |
|
s1 = sentence_batch[:][0] |
|
s2 = sentence_batch[:][1] |
|
imp_score1, imp_score2, prag_distance = model.infer_pairs(s1, s2) |
|
print(imp_score1, imp_score2, prag_distance) |
|
``` |
|
|
|
The output is: |
|
``` |
|
tensor([0.6709, 0.9273]) tensor([1.0984, 1.3642]) tensor([0.6660, 0.7115]) |
|
``` |
|
Which means the implicit score for |
|
- *"I have to leave now. Talk to you later."* is 0.6709 |
|
- *"I can't believe we've talked for so long."* is 1.0984 |
|
- and the pragmatic distance between *"I have to leave now. Talk to you later."* and *"I can't believe we've talked for so long."* is 0.6660 |