Model Card for Model ID
RocqStar Ranker Embedder
A self‑attentive embedding model for premise / proof selection in Rocq ITP. RocqStar is fine‑tuned from CodeBERT so that distances in the embedding space correlate with proof similarity rather than with surface‑level statement overlap. It replaces BM25/Jaccard similarity in retrieval‑augmented generation (RAG) pipelines such as CoqPilot, leading to higher proof success rates on the IMM‑300 benchmark.
Model Details
Model Description
RocqStar is a 125 M‑parameter Transformer encoder (768‑dim hidden size) with multi‑head self‑attention and a learned self‑attentive pooling head. It is trained with an InfoNCE contrastive objective so that the cosine similarity of two statement embeddings approximates the similarity of their proofs, measured by a hybrid Levenshtein + Jaccard distance. The model takes tokenised Rocq (Gallina) theorem statements as input and outputs a 768‑d embedding.
- Model type: Transformer encoder with self‑attentive pooling
- Language(s): Rocq / Coq (Gallina) syntax (tokens)
- License: Apache‑2.0
- Fine‑tuned from:
microsoft/codebert-base
Model Sources
- Repository: https://github.com/JetBrains-Research/big-rocq
- Paper: (coming soon)
How to Get Started
Go to https://github.com/JetBrains-Research/big-rocq. In subdirectory ranker-server
you can find the server that you need to run to be able to use the model from the CoqPilot plugin. Also it may be used as a reference example of how to do inference with the model.
Training Details
Training Data
- Corpus: 76 524 mined statements + proofs from 4 large Rocq projects.
- Positive/negative mining: Proof distance < τ₊ (0.35) → positive; > τ₋ (0.6) → negative; 4 hard negatives per anchor.
Training Procedure
- Objective: InfoNCE
- Batch size: 32
- Optimizer / LR: AdamW, lr = 4e‑6, linear warm‑up 10 %, 22k steps
- Hardware: 1× NVIDIA H100 GPU, 160 GB RAM, 14 h wall‑clock
Evaluation
Testing Data, Factors & Metrics
- Dataset: IMM‑300 (300 Rocq theorems) from the IMM project
- Metrics: Downstream proof success rate of CoqPilot when given top‑7 retrieved premises; averaged over 12 generations.
Results
Model (back‑end) | Bucket | Baseline Jaccard | RocqStar |
---|---|---|---|
GPT‑4o | ≤ 4 | 48 ± 5 % | 51 ± 5 % |
GPT‑4o | 5–8 | 18 ± 4 % | 25 ± 3 % |
GPT‑4o | 9–20 | 11 ± 4 % | 11 ± 5 % |
Claude 3.5 | ≤ 4 | 58 ± 5 % | 61 ± 4 % |
Claude 3.5 | 5–8 | 28 ± 5 % | 36 ± 5 % |
Claude 3.5 | 9–20 | 16 ± 5 % | 21 ± 5 % |
Summary
RocqStar delivers consistent gains, up to 28% relative improvement over Jaccard-index based retrieval, especially for medium‑length theorems where proof similarity diverges most from statement similarity.
- Downloads last month
- 25
Model tree for JetBrains-Research/rocq-language-theorem-embeddings
Base model
microsoft/codebert-base