ChineseSafe-Benchmark / changelog.md
hxiang's picture
hongfu_test_20250701 (#17)
2cdd84a verified
|
raw
history blame
1.48 kB

CHANGELOG

2024-7-16

version: v1.0.0

changed:
- [1]feat: upload the first version

2024-10-26

version: v1.0.1

changed:
- [1]feat: add citation

2024-11-18

version: v1.0.2

changed:
- [1]feat: add three models: Qwen2.5-72B, Qwen2.5-32B, Qwen2-72B
- [2]feat: add subclass: Discrimination

2024-11-24

version: v1.0.3

changed:
- [1]feat: add three Qwen instruct models
- [2]feat: remove Qwen base models
- [3]feat: update some models' name

2024-12-28

version: v1.0.4

changed:
- [1]feat: update 9 models due to the December's todo-list:
    - QwQ-32B-Preview
    - Llama-3.1-70B-Instruct
    - Llama-3.3-70B-Instruct
    - Mistral-Nemo-Instruct-2407
    - Ministral-8B-Instruct-2410
    - Phi-3-small-8k-instruct
    - Phi-3-small-128k-instruct
    - Phi-3-medium-4k-instruct
    - Phi-3-medium-128k-instruct

2025-4-13

version: v1.0.5

changed:
- [1]feat: update 4 models due to the February's todo-list:
    - phi-4
    - DeepSeek-R1-Distill-Llama-70B
    - Mistral-Small-24B-Instruct-2501
    - Moonlight-16B-A3B-Instruct
- [2]feat: release a test set of 20000 samples

2025-7-1

version: v1.0.6

changed:
- [1]feat: update many models due to the April's todo-list:
    - Llama-4-maverick
    - Gemini-2.5-flash-preview-05-20
    - Deepseek-chat-v3-0324
    - Qwen3
    - Gemma-3
    - OpenThinker2