ImpScore / README.md
audreyeleven's picture
Update README.md
8f0f427 verified
metadata
license: apache-2.0
language:
  - en
tags:
  - metric
  - scoring
  - implicit_langauge
  - implicitness
  - linguistic

This repo contains a trained metric, ImpScore, which is trained on 112,580 sentence pairs using contrastive learning. It calculates an implicitness score ranging from [0, 2] for an English input sentence. A higher score indicates greater implicitness. Additionally, it can calculate the pragmatic distance between two input sentences, with the distance value ranging from [0, ∞). A higher distance means the two sentences differ more in their intended meaning.

The training code for this metric is available on Github: https://github.com/audreycs/ImpScore


Download ImpScore:

Since the model is fully customized, you need to download the model file first to use it.

Method 1: Dynamic loading

from huggingface_hub import hf_hub_download


repo_id = "audreyeleven/ImpScore"

# Download the model python file
model_path = hf_hub_download(repo_id=repo_id, filename="impscore.py")

# Load the model dynamically
import importlib.util
spec = importlib.util.spec_from_file_location("ModelClass", model_path)
model_module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(model_module)

model = model_module.ImpModel.from_pretrained(repo_id)

device = "cuda"  # or "cpu"
model.device = device
model.to(device)

model.eval()

Method 2: Local loading

You can also download the model python file into your local and import it.

from impscore import ImpModel
model = ImpModel.from_pretrained(repo_id)

device = "cuda"  # or "cpu"
model.device = device
model.to(device)

model.eval()

Use This Metric

Calculating implicitness score for single sentence

The metric has the .infer(sentence) function which takes a single sentence as the input, and returns

  • its implicitness score
  • implicit embedding
  • pragmatic embedding
# test inference

imp_score, imp_embedding, prag_embedding = model.infer("I have to leave now. Talk to you later.")
print(imp_score, imp_embedding, prag_embedding)
imp_score, imp_embedding, prag_embedding = model.infer("I can't believe we've talked for so long.")
print(imp_score, imp_embedding, prag_embedding)

The outputs:

tensor(0.6709, device='cuda:0', grad_fn=<RsubBackward1>) tensor([ 0.0458, -0.0149, -0.0182, -0.0905,  0.0541, -0.0133, ...])
tensor(1.0984, device='cuda:0', grad_fn=<RsubBackward1>) tensor([-0.0086, -0.1357, -0.0067, -0.0513, -0.0225,  0.0664, ...])

This means the second sentence "I can't believe we've talked for so long." is more implicit.

Calculating implicitness score and pragmatic distance for sentence pairs

The .infer_pairs(sent_batch1, sent_batch2) function takes pairs of sentences as input, and calculates

  • their individual implicitness score
  • their pragmatic distance

sent_batch1 is the list of the first sentence in each pair, and sent_batch2 is the list of the second sentence in each pair.

sentence_batch = [
    ["I have to leave now. Talk to you later.", "I can't believe we've talked for so long."],
    ["You must find a new place and move out by the end of this month.", "Maybe exploring other housing options could benefit us both?"]
]
s1 = sentence_batch[:][0]
s2 = sentence_batch[:][1]
imp_score1, imp_score2, prag_distance = model.infer_pairs(s1, s2)
print(imp_score1, imp_score2, prag_distance)

The output is:

tensor([0.6709, 0.9273]) tensor([1.0984, 1.3642]) tensor([0.6660, 0.7115])

Which means the implicit score for

  • "I have to leave now. Talk to you later." is 0.6709
  • "I can't believe we've talked for so long." is 1.0984
  • and the pragmatic distance between "I have to leave now. Talk to you later." and "I can't believe we've talked for so long." is 0.6660