zhangchenxu commited on
Commit
a211d89
·
verified ·
1 Parent(s): 61df073

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -19
README.md CHANGED
@@ -11,26 +11,20 @@ model-index:
11
  results: []
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
 
17
- # Qwen3-1.7B-SFT-TinyV_Reasoning_Balanced_v2.1_Qwen3-LR1.0e-5-EPOCHS2
 
 
18
 
19
- This model is a fine-tuned version of [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) on the TinyV_Reasoning_Balanced_v2.1_Qwen3 dataset.
20
 
21
- ## Model description
 
22
 
23
- More information needed
24
 
25
- ## Intended uses & limitations
26
-
27
- More information needed
28
-
29
- ## Training and evaluation data
30
-
31
- More information needed
32
-
33
- ## Training procedure
34
 
35
  ### Training hyperparameters
36
 
@@ -49,10 +43,6 @@ The following hyperparameters were used during training:
49
  - lr_scheduler_warmup_ratio: 0.1
50
  - num_epochs: 2.0
51
 
52
- ### Training results
53
-
54
-
55
-
56
  ### Framework versions
57
 
58
  - Transformers 4.52.4
 
11
  results: []
12
  ---
13
 
14
+ [**TinyV**]((https://arxiv.org/abs/2505.14625)) is a reward system for efficient RL post-training that detects false negatives in current rule-based verifiers and provides more accurate reward signals via a small LLM during RL training. Experiments show that TinyV incurs only 6% additional computational cost while significantly increasing both RL efficiency and final model performance.
 
15
 
16
+ - 📄 [Technical Report](https://arxiv.org/abs/2505.14625) - Including false negative analysis and theotical insights behind TinyV
17
+ - 💾 [Github Repo](https://github.com/uw-nsl/TinyV) - Access the complete pipeline for more efficient RL training via TinyV
18
+ - 🤗 [HF Collection](https://huggingface.co/collections/zhangchenxu/tinyv-682d5840c7e309217df625df) - Training Data, Benchmarks, and Model Artifact
19
 
20
+ This model is a fine-tuned version of Qwen/Qwen3-1.7B on [zhangchenxu/TinyV_Think_Training_Data_Qwen3_Balanced](https://huggingface.co/datasets/zhangchenxu/TinyV_Think_Training_Data_Qwen3_Balanced) dataset.
21
 
22
+ ### Overview
23
+ ![TinyV Pipeline](https://huggingface.co/zhangchenxu/TinyV-1.5B/resolve/main/fn_tinyv_combine.png)
24
 
25
+ ### How to use it?
26
 
27
+ Please refer to the codebase: [https://github.com/uw-nsl/TinyV](https://github.com/uw-nsl/TinyV) for details.
 
 
 
 
 
 
 
 
28
 
29
  ### Training hyperparameters
30
 
 
43
  - lr_scheduler_warmup_ratio: 0.1
44
  - num_epochs: 2.0
45
 
 
 
 
 
46
  ### Framework versions
47
 
48
  - Transformers 4.52.4