FlowCoT Checkpoints
Checkpoints for the FlowCoT latent-reasoning code-generation model, built on Qwen3-8B-Base with a normalizing-flow latent path and a VAE-encoded reasoning compression.
Checkpoints
dual_path_stage2_checkpoint-22900
Best dual-path (NF + CE) checkpoint. Stage 2 of training with config_coding.yaml (original cap schedule).
unified_stage1_checkpoint-3125
Stage 1 of the new unified (one-forward) architecture with the fixlen dataset caps. Training ongoing.
Evaluation โ pass@1 (mean@16)
Comparison of dual_path_stage2_checkpoint-22900 against LaDiR (diffusion-based baseline, same backbone and training data).
| Benchmark | FlowCoT (ours) | LaDiR |
|---|---|---|
| MBPP | 74.4% | 66.8% |
| MBPP+ | 77.5% | 59.5% |
| HumanEval | 82.9% | 87.4% |
| HumanEval+ | 77.8% | 73.2% |
FlowCoT numbers: pass@1 over mean@16 (8 seeds ร 2 samples, temperature 0.6).
Usage
Training and evaluation code: https://github.com/GMLR-Penn/FlowCoT
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for Penn-GMLR/FlowCoT_ckpt
Base model
Qwen/Qwen3-8B-Base