Model Card for Model ID

RocqStar Ranker Embedder

A self‑attentive embedding model for premise / proof selection in Rocq ITP. RocqStar is fine‑tuned from CodeBERT so that distances in the embedding space correlate with proof similarity rather than with surface‑level statement overlap. It replaces BM25/Jaccard similarity in retrieval‑augmented generation (RAG) pipelines such as CoqPilot, leading to higher proof success rates on the IMM‑300 benchmark.

Model Details

Model Description

RocqStar is a 125 M‑parameter Transformer encoder (768‑dim hidden size) with multi‑head self‑attention and a learned self‑attentive pooling head. It is trained with an InfoNCE contrastive objective so that the cosine similarity of two statement embeddings approximates the similarity of their proofs, measured by a hybrid Levenshtein + Jaccard distance. The model takes tokenised Rocq (Gallina) theorem statements as input and outputs a 768‑d embedding.

  • Model type: Transformer encoder with self‑attentive pooling
  • Language(s): Rocq / Coq (Gallina) syntax (tokens)
  • License: Apache‑2.0
  • Fine‑tuned from: microsoft/codebert-base

Model Sources

How to Get Started

Go to https://github.com/JetBrains-Research/big-rocq. In subdirectory ranker-server you can find the server that you need to run to be able to use the model from the CoqPilot plugin. Also it may be used as a reference example of how to do inference with the model.

Training Details

Training Data

  • Corpus: 76 524 mined statements + proofs from 4 large Rocq projects.
  • Positive/negative mining: Proof distance < τ₊ (0.35) → positive; > τ₋ (0.6) → negative; 4 hard negatives per anchor.

Training Procedure

  • Objective: InfoNCE
  • Batch size: 32
  • Optimizer / LR: AdamW, lr = 4e‑6, linear warm‑up 10 %, 22k steps
  • Hardware: 1× NVIDIA H100 GPU, 160 GB RAM, 14 h wall‑clock

Evaluation

Testing Data, Factors & Metrics

  • Dataset: IMM‑300 (300 Rocq theorems) from the IMM project
  • Metrics: Downstream proof success rate of CoqPilot when given top‑7 retrieved premises; averaged over 12 generations.

Results

Model (back‑end) Bucket Baseline Jaccard RocqStar
GPT‑4o ≤ 4 48 ± 5 % 51 ± 5 %
GPT‑4o 5–8 18 ± 4 % 25 ± 3 %
GPT‑4o 9–20 11 ± 4 % 11 ± 5 %
Claude 3.5 ≤ 4 58 ± 5 % 61 ± 4 %
Claude 3.5 5–8 28 ± 5 % 36 ± 5 %
Claude 3.5 9–20 16 ± 5 % 21 ± 5 %

Summary

RocqStar delivers consistent gains, up to 28% relative improvement over Jaccard-index based retrieval, especially for medium‑length theorems where proof similarity diverges most from statement similarity.

Downloads last month
25
Safetensors
Model size
125M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JetBrains-Research/rocq-language-theorem-embeddings

Finetuned
(59)
this model