preminstrel commited on
Commit
844c7c0
·
verified ·
1 Parent(s): c397a0f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -1
README.md CHANGED
@@ -8,4 +8,35 @@ pipeline_tag: text-generation
8
  tags:
9
  - code
10
  - CUDA
11
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  tags:
9
  - code
10
  - CUDA
11
+ ---
12
+
13
+ ## CudaLLM: A Language Model for High-Performance CUDA Kernel Generation
14
+
15
+ ### Model Description
16
+ cudaLLM-8B is a language model for generating high-performance and syntactically correct CUDA kernels. It is based on the Qwen3-8B model and has undergone a two-stage training process to master the complexities of parallel programming for GPUs.
17
+
18
+ **Performance on KernelBench:**
19
+ | | Bo1 | Bo2 | Bo4 | Bo8 | Bo16 |
20
+ |---------|-------|-----|-----|-----|------|
21
+ | Level-1 | 79.75 | 83 | 84 | 86 | 87 |
22
+ | Level-2 | 67.30 | 70 | 71 | 72 | 73 |
23
+ | Level-3 | 20.83 | 26 | 30 | 34 | 36 |
24
+
25
+ ### Training Procedure
26
+ The model was trained using the verl library. The model was trained and evaluated on:
27
+ - SFT Dataset: A high-quality dataset of CUDA problem-solution pairs ([sft_cuda_llm_r1.parquet](https://huggingface.co/datasets/ByteDance-Seed/cudaLLM-data)), originally generated by DeepSeek R1, DeepSeel Coder-7B, and Qwen2-32B.
28
+ - RL Dataset: A refined dataset ([rl_cuda_llm_0424.parquet](https://huggingface.co/datasets/ByteDance-Seed/cudaLLM-data)) used to provide performance-based rewards during the RL stage.
29
+ - Evaluation Dataset: The model's performance was benchmarked against the KernelBench dataset.
30
+
31
+ ### Intended Use and Limitations
32
+ #### Intended Use
33
+ The primary use of CudaLLM is to assist developers in writing and optimizing high-performance CUDA kernels. It can be used for:
34
+ - Accelerating scientific computing and machine learning workloads.
35
+ - As a co-pilot or productivity tool for HPC and CUDA developers.
36
+ - Research into AI-driven code generation and optimization.
37
+
38
+ #### Limitations and Bias
39
+ - Correctness is Not Guaranteed: While trained to produce correct code, the model's output should always be rigorously tested and verified before deployment in production systems.
40
+ - Security Risks: The generated code is not guaranteed to be secure. Never run model-generated code from an untrusted source without careful inspection.
41
+ - Performance Variability: Kernel performance can vary significantly depending on the target GPU architecture, input data sizes, and compiler version. The generated code may require further manual tuning.
42
+ - Specialized Domain: This model is highly specialized for CUDA code generation. Its performance on general-purpose programming tasks or natural language conversation will be limited.