ImpScore / README.md

Update README.md

8f0f427 verified 3 months ago

3.85 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- metric
	- scoring
	- implicit_langauge
	- implicitness
	- linguistic
	---
	This repo contains a trained metric, ImpScore, which is trained on 112,580 sentence pairs using contrastive learning.
	It calculates an implicitness score ranging from [0, 2] for an English input sentence. A higher score indicates greater implicitness. Additionally, it can calculate the pragmatic distance between two input sentences, with the distance value ranging from [0, ∞). A higher distance means the two sentences differ more in their intended meaning.

	The training code for this metric is available on Github: [https://github.com/audreycs/ImpScore](https://github.com/audreycs/ImpScore)

	<br>

	# Download ImpScore:
	Since the model is fully customized, you need to download the model file first to use it.

	### Method 1: Dynamic loading
	```python
	from huggingface_hub import hf_hub_download


	repo_id = "audreyeleven/ImpScore"

	# Download the model python file
	model_path = hf_hub_download(repo_id=repo_id, filename="impscore.py")

	# Load the model dynamically
	import importlib.util
	spec = importlib.util.spec_from_file_location("ModelClass", model_path)
	model_module = importlib.util.module_from_spec(spec)
	spec.loader.exec_module(model_module)

	model = model_module.ImpModel.from_pretrained(repo_id)

	device = "cuda" # or "cpu"
	model.device = device
	model.to(device)

	model.eval()
	```

	### Method 2: Local loading
	You can also download the model python file into your local and import it.
	```python
	from impscore import ImpModel
	model = ImpModel.from_pretrained(repo_id)

	device = "cuda" # or "cpu"
	model.device = device
	model.to(device)

	model.eval()
	```

	<br>

	# Use This Metric
	### Calculating implicitness score for single sentence
	The metric has the `.infer(sentence)` function which takes a single sentence as the input, and returns
	- its implicitness score
	- implicit embedding
	- pragmatic embedding

	```python
	# test inference

	imp_score, imp_embedding, prag_embedding = model.infer("I have to leave now. Talk to you later.")
	print(imp_score, imp_embedding, prag_embedding)
	imp_score, imp_embedding, prag_embedding = model.infer("I can't believe we've talked for so long.")
	print(imp_score, imp_embedding, prag_embedding)
	```
	The outputs:
	```
	tensor(0.6709, device='cuda:0', grad_fn=<RsubBackward1>) tensor([ 0.0458, -0.0149, -0.0182, -0.0905, 0.0541, -0.0133, ...])
	tensor(1.0984, device='cuda:0', grad_fn=<RsubBackward1>) tensor([-0.0086, -0.1357, -0.0067, -0.0513, -0.0225, 0.0664, ...])
	```
	This means the second sentence "I can't believe we've talked for so long." is more implicit.

	### Calculating implicitness score and pragmatic distance for sentence pairs
	The `.infer_pairs(sent_batch1, sent_batch2)` function takes pairs of sentences as input, and calculates
	- their individual implicitness score
	- their pragmatic distance

	`sent_batch1` is the list of the first sentence in each pair, and `sent_batch2` is the list of the second sentence in each pair.

	```python
	sentence_batch = [
	["I have to leave now. Talk to you later.", "I can't believe we've talked for so long."],
	["You must find a new place and move out by the end of this month.", "Maybe exploring other housing options could benefit us both?"]
	]
	s1 = sentence_batch[:][0]
	s2 = sentence_batch[:][1]
	imp_score1, imp_score2, prag_distance = model.infer_pairs(s1, s2)
	print(imp_score1, imp_score2, prag_distance)
	```

	The output is:
	```
	tensor([0.6709, 0.9273]) tensor([1.0984, 1.3642]) tensor([0.6660, 0.7115])
	```
	Which means the implicit score for
	- "I have to leave now. Talk to you later." is 0.6709
	- "I can't believe we've talked for so long." is 1.0984
	- and the pragmatic distance between "I have to leave now. Talk to you later." and "I can't believe we've talked for so long." is 0.6660