coldchair16 commited on
Commit
1566f7a
Β·
verified Β·
1 Parent(s): 886fb2b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -3
README.md CHANGED
@@ -1,3 +1,49 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ ---
4
+
5
+
6
+ # CPRetriever-Code
7
+
8
+ **CPRetriever-Code** is a code embedding model trained via contrastive learning for **code-related retrieval tasks** in competitive programming. It achieves strong performance on tasks such as:
9
+
10
+ * **Text-to-Code** retrieval (problem description β†’ relevant code)
11
+ * **Code-to-Code** retrieval (find alternate solutions to the same problem)
12
+
13
+ This model is part of the [CPRet](https://github.com/coldchair/CPRet) suite for competitive programming retrieval research.
14
+
15
+ ## πŸ”§ Usage
16
+
17
+ You can load this model using the `sentence-transformers` library:
18
+
19
+ ```python
20
+ from sentence_transformers import SentenceTransformer
21
+
22
+ model = SentenceTransformer("coldchair16/CPRetriever-Code")
23
+ embeddings = model.encode([
24
+ "def mex_query(arr):\n n = len(arr)\n seen = set()\n for i in range(n):\n seen.add(arr[i])\n i = 0\n while True:\n if i not in seen:\n return i\n i += 1"
25
+ ])
26
+ ```
27
+
28
+ ## πŸ’‘ Applications
29
+
30
+ This model is optimized for **code-level semantic retrieval** in competitive programming settings:
31
+
32
+ * **Text-to-Code**: Retrieve relevant code snippets given a natural language problem description.
33
+ * **Code-to-Code**: Retrieve alternative implementations of the same problem.
34
+
35
+ It is particularly effective for analyzing programming contest submissions, searching solution variants, and building educational tools for code understanding.
36
+
37
+ ## πŸ“š Training and Evaluation
38
+
39
+ CPRetriever-Code is trained via **contrastive learning** using positive and hard negative code pairs derived from [CPRet-data](https://huggingface.co/datasets/coldchair16/CPRet-data).
40
+
41
+ For the training pipeline, see the full project:
42
+ πŸ‘‰ [CPRet on GitHub](https://github.com/coldchair/CPRet?tab=readme-ov-file)
43
+
44
+ ## πŸ“¦ Model Card
45
+
46
+ * Architecture: `Salesforce/SFR-Embedding-Code-2B_R` (encoder backbone)
47
+ * Training: Contrastive objective on code/code and text/code pairs
48
+ * Format: Compatible with `sentence-transformers`
49
+