Spaces:

qlemesle
/

parapluie

Running

App Files Files Community

parapluie / README.md

qlemesle

orth

3567093 5 days ago

preview code

raw

history blame contribute delete

4.36 kB

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

metadata

title: ParaPLUIE
emoji: ☂️
tags:
  - evaluate
  - metric
description: >-
  ParaPLUIE is a metric for evaluating the semantic proximity between two
  sentences.  ParaPLUIE uses the perplexity of an LLM to compute a confidence
  score. It has shown the highest correlation with human judgment on paraphrase
  classification while maintaining a low computational cost, as it roughly
  equivalent to the cost of generating a single token.
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
short_description: ParaPLUIE is a metric for evaluating the semantic proximity

Metric Card for ParaPLUIE (Paraphrase Generation Evaluation Powered by an LLM)

Metric Description

ParaPLUIE is a metric for evaluating the semantic proximity between two sentences. ParaPLUIE uses the perplexity of an LLM to compute a confidence score. It has shown the highest correlation with human judgment on paraphrase classification while maintaining a low computational cost, as it roughly equivalent to the cost of generating a single token.

How to Use

This metric requires a source sentence and its hypothetical paraphrase.

import evaluate
ppluie = evaluate.load("qlemesle/parapluie")
ppluie.init(model="mistralai/Mistral-7B-Instruct-v0.2")
S = "Have you ever seen a tsunami ?" 
H = "Have you ever seen a tiramisu ?"
results = ppluie.compute(sources=[S], hypotheses=[H])
print(results)
>>> {'scores': [-16.97607421875]}

Inputs

sources (list of string): Source sentences.
hypotheses (list of string): Hypothetical paraphrases.

Output Values

score (float): ParaPLUIE score. Minimum possible value is -inf. Maximum possible value is +inf. A score greater than 0 means that sentences are paraphrases. A score lower than 0 indicates the opposite.

This metric outputs a dictionary containing the score.

Examples

Simple example

import evaluate
ppluie = evaluate.load("qlemesle/parapluie")
ppluie.init(model="mistralai/Mistral-7B-Instruct-v0.2")
S = "Have you ever seen a tsunami ?" 
H = "Have you ever seen a tiramisu ?"
results = ppluie.compute(sources=[S], hypotheses=[H])
print(results)
>>> {'scores': [-16.97607421875]}

Configure metric

ppluie.init(
  model = "mistralai/Mistral-7B-Instruct-v0.2",
  device = "cuda:0",
  template = "FS-DIRECT",
  use_chat_template = True,
  half_mode = True,
  n_right_specials_tokens = 1
)

Show the available prompting templates

ppluie.show_templates()
>>> DIRECT
>>> MEANING
>>> INDIRECT
>>> FS-DIRECT
>>> FS-DIRECT_MAJ
>>> FS-DIRECT_FR
>>> FS-DIRECT_MAJ_FR
>>> FS-DIRECT_FR_MIN
>>> NETWORK

Show the LLMs that have already been tested with ParaPLUIE

ppluie.show_available_models()
>>> HuggingFaceTB/SmolLM2-135M-Instruct
>>> HuggingFaceTB/SmolLM2-360M-Instruct
>>> HuggingFaceTB/SmolLM2-1.7B-Instruct
>>> google/gemma-2-2b-it
>>> state-spaces/mamba-2.8b-hf
>>> internlm/internlm2-chat-1_8b
>>> microsoft/Phi-4-mini-instruct
>>> mistralai/Mistral-7B-Instruct-v0.2
>>> tiiuae/falcon-mamba-7b-instruct
>>> Qwen/Qwen2.5-7B-Instruct
>>> CohereForAI/aya-expanse-8b
>>> google/gemma-2-9b-it
>>> meta-llama/Meta-Llama-3-8B-Instruct
>>> microsoft/phi-4
>>> CohereForAI/aya-expanse-32b
>>> Qwen/QwQ-32B
>>> CohereForAI/c4ai-command-r-08-2024

Change the prompting template

ppluie.setTemplate("DIRECT")

Show how the prompt is encoded to ensure that the correct numbers of special tokens are removed and that the words "Yes" and "No" each fit into a single token

ppluie.check_end_tokens_tmpl()

Limitations and Bias

This metric is based on an LLM and is therefore limited by the LLM that is used.

Source code

GitLab

Citation

@inproceedings{lemesle-etal-2025-paraphrase,
    title = "Paraphrase Generation Evaluation Powered by an {LLM}: A Semantic Metric, Not a Lexical One",
    author = "Lemesle, Quentin  and
      Chevelu, Jonathan  and
      Martin, Philippe  and
      Lolive, Damien  and
      Delhay, Arnaud  and
      Barbot, Nelly",
    booktitle = "Proceedings of the 31st International Conference on Computational Linguistics",
    year = "2025",
    url = "https://aclanthology.org/2025.coling-main.538/"
}