CudaLLM: A Language Model for High-Performance CUDA Kernel Generation
Model Description
cudaLLM-8B is a language model for generating high-performance and syntactically correct CUDA kernels. It is based on the Qwen3-8B model and has undergone a two-stage training process to master the complexities of parallel programming for GPUs.
Performance on KernelBench:
|
Bo1 |
Bo2 |
Bo4 |
Bo8 |
Bo16 |
| Level-1 |
79.75 |
83 |
84 |
86 |
87 |
| Level-2 |
67.30 |
70 |
71 |
72 |
73 |
| Level-3 |
20.83 |
26 |
30 |
34 |
36 |
Training Procedure
The model was trained using the verl library. The model was trained and evaluated on:
- SFT Dataset: A high-quality dataset of CUDA problem-solution pairs (sft_cuda_llm_r1.parquet), originally generated by DeepSeek R1, DeepSeel Coder-7B, and Qwen2-32B.
- RL Dataset: A refined dataset (rl_cuda_llm_0424.parquet) used to provide performance-based rewards during the RL stage.
- Evaluation Dataset: The model's performance was benchmarked against the KernelBench dataset.
Intended Use and Limitations
Intended Use
The primary use of CudaLLM is to assist developers in writing and optimizing high-performance CUDA kernels. It can be used for:
- Accelerating scientific computing and machine learning workloads.
- As a co-pilot or productivity tool for HPC and CUDA developers.
- Research into AI-driven code generation and optimization.
Limitations and Bias
- Correctness is Not Guaranteed: While trained to produce correct code, the model's output should always be rigorously tested and verified before deployment in production systems.
- Security Risks: The generated code is not guaranteed to be secure. Never run model-generated code from an untrusted source without careful inspection.
- Performance Variability: Kernel performance can vary significantly depending on the target GPU architecture, input data sizes, and compiler version. The generated code may require further manual tuning.
- Specialized Domain: This model is highly specialized for CUDA code generation. Its performance on general-purpose programming tasks or natural language conversation will be limited.