shareAI

non-profit

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

Evanwu50020 updated a model 5 days ago

shareAI/gemma3-r1-12b

Evanwu50020 published a model 5 days ago

shareAI/gemma3-r1-12b

WDong authored a paper about 2 months ago

Language Models as Continuous Self-Evolving Data Engineers

View all activity

shareAI's activity

lianghsun

posted an update 1 day ago

Post

1597

With the arrival of Twinkle April — Twinkle AI’s annual open-source celebration held every April — our community is excited to unveil its very first project:

📊 Twinkle Eval (https://github.com/ai-twinkle/Eval), a next-generation evaluation tool led by our contributor @tedslin .

Unlike traditional evaluation tools like iKala’s ievals (https://github.com/ikala-ai/ievals), which can only evaluate language models (LMs) one sample at a time, Twinkle Eval is designed with Large Reasoning Models (LRMs) in mind. As reasoning time increases with more complex models, traditional tools become increasingly inefficient 😲 — for example, evaluating LRMs on the ikala/tmmluplus benchmark could take *
half a day without finishing.

One question we were especially curious about:
Does shuffling multiple-choice answer order impact model accuracy? 🤔
→ See: "Change Answer Order Can Decrease MMLU Accuracy" – arXiv:2406.19470v1

To address these challenges, Twinkle Eval brings three key innovations to the table:

1️⃣ Parallelized evaluation of samples
2️⃣ Multi-round testing for stability
3️⃣ Randomized answer order to test robustness

After running experiments, we observed that Twinkle Eval can speed up evaluation by up to 15× 🚀🚀. Interestingly, most models scored slightly lower under the 2️⃣3️⃣ test settings compared to their claimed performance — suggesting further benchmarking is needed.

This framework also comes with additional tunable parameters and detailed logging of LM behavior per question — perfect for those who want to dive deeper. 😆

If you find Twinkle Eval useful, please ⭐ the project and help spread the word 🤗

5 replies

Evanwu50020

updated a model 5 days ago

shareAI/gemma3-r1-12b

Updated 5 days ago

Evanwu50020

published a model 5 days ago

shareAI/gemma3-r1-12b

Updated 5 days ago

WDong

authored a paper about 2 months ago

Language Models as Continuous Self-Evolving Data Engineers

Paper • 2412.15151 • Published Dec 19, 2024 • 2

ZhuangXialie

authored a paper about 2 months ago

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

Paper • 2502.07490 • Published Feb 11 • 9

Baicai003

updated a Space about 2 months ago

shareAI

🚀

StarRing2022

updated 3 models about 2 months ago

StarRing2022

updated a dataset about 2 months ago

shareAI/Alpaca-Distill-R1-ZH

Viewer • Updated Feb 6 • 179k • 196 • 15

StarRing2022

published 2 models about 2 months ago

shareAI/llama3.2-3b-r1-GGUF

Updated Feb 6 • 1

shareAI/llama3.2-1b-r1-GGUF

Updated Feb 6 • 3

StarRing2022

published a dataset about 2 months ago

shareAI/Alpaca-Distill-R1-ZH

Viewer • Updated Feb 6 • 179k • 196 • 15

StarRing2022

published a model about 2 months ago

shareAI/qwen2.5-0.5b-r1-GGUF

Updated Feb 6 • 4

Felguk

in shareAI/Felguk0.5-turbo-preview 2 months ago

Any update?

#1 opened 2 months ago by

Baicai003

in shareAI/Felguk0.5-turbo-preview 2 months ago

Any update?

#1 opened 2 months ago by

Baicai003

Alanturner2

updated a Space 2 months ago

Arxiv Summarizer

🔥

Summarize arxiv papers and chat with your data

Alanturner2

published a Space 2 months ago

Arxiv Summarizer

🔥

Summarize arxiv papers and chat with your data

lianghsun

posted an update 3 months ago

Post

2389

🖖 Let me introduce the work I've done over the past three months: 𝗟𝗹𝗮𝗺𝗮-𝟯.𝟮-𝗧𝗮𝗶𝘄𝗮𝗻-𝟯𝗕 and 𝗟𝗹𝗮𝗺𝗮-𝟯.𝟮-𝗧𝗮𝗶𝘄𝗮𝗻-𝟯𝗕-𝗜𝗻𝘀𝘁𝗿𝘂𝗰𝘁, now open-sourced on 🤗 Hugging Face.

𝗹𝗶𝗮𝗻𝗴𝗵𝘀𝘂𝗻/𝗟𝗹𝗮𝗺𝗮-𝟯.𝟮-𝗧𝗮𝗶𝘄𝗮𝗻-𝟯𝗕: This model is built on top of 𝗺𝗲𝘁𝗮-𝗹𝗹𝗮𝗺𝗮/𝗟𝗹𝗮𝗺𝗮-𝟯.𝟮-𝟯𝗕 with continual pretraining. The training dataset consists of a mixture of Traditional Chinese and multilingual texts in specific proportions, including 20B tokens of Traditional Chinese text.

𝗹𝗶𝗮𝗻𝗴𝗵𝘀𝘂𝗻/𝗟𝗹𝗮𝗺𝗮-𝟯.𝟮-𝗧𝗮𝗶𝘄𝗮𝗻-𝟯𝗕-𝗜𝗻𝘀𝘁𝗿𝘂𝗰𝘁: This is a fine-tuned conversational model based on the foundation model.

This Llama-3.2-Taiwan open-source project is currently a one-person effort (yes, I did everything from text preparation — so exhausting!). If you're interested, feel free to join the Discord server for discussions.

🅱🅴🅽🅲🅷🅼🅰🆁🅺🅸🅽🅶

The evaluation was conducted using ikala/tmmluplus, though the README page does not yet reflect the latest results. The performance is close to the previous versions, indicating that further improvements might require adding more specialized knowledge in the datasets.

🅰 🅲🅰🅻🅻 🅵🅾🆁 🆂🆄🅿🅿🅾🆁🆃

If anyone is willing to provide compute resources, it would be greatly appreciated to help this project continue and grow. 💪

---
🏔️ Foundation model: lianghsun/Llama-3.2-Taiwan-3B
🤖 Instruction model: lianghsun/Llama-3.2-Taiwan-3B-Instruct
⚡ GGUF: lianghsun/Llama-3.2-Taiwan-3B-Instruct-GGUF

4 replies

Felguk

updated a Space 3 months ago

VeTools

😻

Thk

AI & ML interests

Recent Activity

Team members 123

shareAI's activity

shareAI

Any update?

Any update?

Arxiv Summarizer

Arxiv Summarizer

VeTools