GLM-4-32B-Base-32K

GLM-4-32B-Base-32K is an enhanced version of THUDM's GLM-4-32B-Base-0414, specifically engineered to offer robust performance over an extended context window. While the original model's capabilities degraded after 8,192 tokens, this version maintains strong performance up to a 32,000-token context, making it ideal for tasks requiring long-context understanding and processing.

This model was developed as a proof-of-concept to validate that a merging-centric approach to context extension can be successfully applied to larger-scale models. The techniques employed resulted in an approximate 5% overall improvement on standard base model benchmarks while significantly improving 32k recall.

More details can be found in our blog post here where we applied this work to our upcoming AFM 4.5B

Model Details

Improvements

The primary improvement in this model is its enhanced long-context capability. The following methods were used to achieve this:

  • Targeted Long-Context Training: The model underwent continued pretraining on sequences up to its full 32,000 token context length.
  • Iterative Merging: Various model checkpoints were iteratively merged to combine the benefits of different training runs, enhancing both long-context and short-context performance.
  • Short-Context Distillation: Knowledge from the original high-performing short-context model was distilled into the long-context-trained model to recover and retain its initial capabilities on shorter tasks.

As a result, where the original model's performance on the Needle in a Haystack (NIAH) benchmark would decline after 8,000 tokens, this extended version maintains reliable performance across the entire 32,000 token context window.

Benchmarks

Benchmark GLM-4-32B-Base-0414 GLM-4-32B-Base-32K
arc_challenge 59.39% 64.93%
arc_easy 85.44% 87.88%
hellaswag 64.75% 65.40%
mmlu 77.05% 77.87%
piqa 81.61% 83.19%
truthfulqa_mc2 49.27% 50.07%
winogrande 78.69% 80.03%

NIAH Benchmark Results Comparison

Model Task 4,096 8,192 16,384 24,576 32,768
GLM-4-32B-Base-0414
niah_single_1 100.0% 100.0% 77.0% 5.2% 1.2%
niah_single_2 100.0% 100.0% 73.4% 2.6% 0.0%
niah_single_3 100.0% 99.8% 48.0% 1.4% 0.0%
GLM-4-32B-Base-32k
niah_single_1 100.0% 100.0% 100.0% 99.2% 99.6%
niah_single_2 100.0% 100.0% 99.2% 80.2% 68.8%
niah_single_3 100.0% 99.6% 95.6% 86.6% 61.0%

NIAH Averages

Model 4,096 8,192 16,384 24,576 32,768
GLM-4-32B-Base-0414 100.0% 99.9% 66.1% 3.1% 0.4%
GLM-4-32B-Base-32k 100.0% 99.9% 98.3% 88.7% 76.5%

Use Cases

This model serves as a new base for continued training at 32K context

License

GLM-4-32B-Base-32K (32B) is released under the MIT license following with the original model's license.

If you have questions or would like to share your experiences using GLM-4-32B-Base-32K (32B), please connect with us on social media. We’re excited to see what you build—and how this model helps you innovate!

Downloads last month
0
Safetensors
Model size
32.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for arcee-ai/GLM-4-32B-Base-32K

Finetuned
(3)
this model
Quantizations
1 model