Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
twinkle-ai
's Collections
🏎️ Formosa-1 Series
💾 Traditional Chinese Datasets
🧠 Traditional Chinese Reasoning Datasets
📋 Eval Logs
📋 Eval Logs
updated
16 days ago
Benchmark log generated with Twinkle Eval, recording the model's outputs for each prompt.
Upvote
3
twinkle-ai/gpt-oss-eval-logs-and-scores
Viewer
•
Updated
16 days ago
•
2.63k
•
146
•
1
twinkle-ai/llama-4-eval-logs-and-scores
Viewer
•
Updated
Apr 9
•
750
•
14
•
2
Upvote
3
Share collection
View history
Collection guide
Browse collections