Emin Temiz PRO

etemiz

AI & ML interests

Alignment

Recent Activity

Organizations

None yet

etemiz's activity

posted an update 3 days ago
reacted to danielhanchen's post with ❀️ 3 days ago
replied to their post 6 days ago
reacted to luigi12345's post with πŸ‘ 7 days ago
view post
Post
3384
🧠 PROMPT FOR CONVERTING ANY MODEL IN REASONING "THINKING" MODELπŸ”₯πŸ€–
Convert any model to Deepseek R1 like "thinking" model. πŸ’­

You're now a thinking-first LLM. For all inputs:

1. Start with <thinking>
   - Break down problems step-by-step
   - Consider multiple approaches
   - Calculate carefully
   - Identify errors
   - Evaluate critically
   - Explore edge cases
   - Check knowledge accuracy
   - Cite sources when possible

2. End with </thinking>

3. Then respond clearly based on your thinking.

The <thinking> section is invisible to users and helps you produce better answers.

For math: show all work and verify
For coding: reason through logic and test edge cases
For facts: verify information and consider reliability
For creative tasks: explore options before deciding
For analysis: examine multiple interpretations

Example:
<thinking>
[Step-by-step analysis]
[Multiple perspectives]
[Self-critique]
[Final conclusion]
</thinking>

[Clear, concise response to user]

  • 3 replies
Β·
posted an update 9 days ago
posted an update 10 days ago
view post
Post
475
Mistral Small 3.1 numbers are in. It is interesting Mistral always lands in the middle.
https://sheet.zoho.com/sheet/open/mz41j09cc640a29ba47729fed784a263c1d08?sheetid=0&range=A1

I started to do the comparison with 2 models now. In the past Llama 3.1 70B Q4 was the one doing the comparison of answers. Now I am using Gemma 3 27B Q8 as well to have a second opinion on it. Gemma 3 produces very similar measurement to Llama 3.1. So the end result is not going to shake much.
  • 1 reply
Β·
replied to their post 14 days ago
view reply

Looks like we need more mature tools for Gemma 3, it is failing to fine tune like half of the time. Unsloth and transformers are getting ready. And I am trying lower learning rates and rank stabilized LoRa, and different r, lora_alpha.

reacted to their post with πŸš€ 14 days ago
view post
Post
1692
Started fine tuning Gemma 3 using evolutionary approach. It is not the worst model according to AHA leaderboard and it is one of the smart according to lmarena.ai. My objective is to make it based, anti woke, wise, beneficial and then some.

Several GPUs are fine tuning it at the same time, each using a different dataset and using QLoRA and the successful ones are merged later. Compared to LoRa this allows faster training and also reduced overfitting because the merge operation heals overfitting. The problem with this could be the 4 bit quantization may make models dumber. But I am not looking for sheer IQ. Too much mind is a problem anyway :)

Has anyone tried parallel QLoRa and merge before?

I also automated the dataset selection and benchmarking and converging to objectives (the fit function, the reward). It is basically trying to get higher score in AHA Leaderboard as fast as possible with a diverse set of organisms that "evolve by training".

I want to release some cool stuff when I have the time:
- how an answer to a single question changes over time, with each training round or day
- a chart to show AHA alignment over training rounds
  • 3 replies
Β·
posted an update 15 days ago
view post
Post
1692
Started fine tuning Gemma 3 using evolutionary approach. It is not the worst model according to AHA leaderboard and it is one of the smart according to lmarena.ai. My objective is to make it based, anti woke, wise, beneficial and then some.

Several GPUs are fine tuning it at the same time, each using a different dataset and using QLoRA and the successful ones are merged later. Compared to LoRa this allows faster training and also reduced overfitting because the merge operation heals overfitting. The problem with this could be the 4 bit quantization may make models dumber. But I am not looking for sheer IQ. Too much mind is a problem anyway :)

Has anyone tried parallel QLoRa and merge before?

I also automated the dataset selection and benchmarking and converging to objectives (the fit function, the reward). It is basically trying to get higher score in AHA Leaderboard as fast as possible with a diverse set of organisms that "evolve by training".

I want to release some cool stuff when I have the time:
- how an answer to a single question changes over time, with each training round or day
- a chart to show AHA alignment over training rounds
  • 3 replies
Β·
posted an update 17 days ago
posted an update 22 days ago
view post
Post
1324
Benchmarked Gemma 3 today. It has better knowledge compared to 2 but still in the median area in the leaderboard.
  • 1 reply
Β·
posted an update 29 days ago
view post
Post
1689
Benchmarked QwQ for the AHA Leaderboard. Compared to Qwen 2.5 knows nutrition and fasting better but lacks faith.

  • 1 reply
Β·
posted an update about 1 month ago
posted an update about 1 month ago
view post
Post
563
https://www.youtube.com/watch?v=EMyAGuHnDHk

In the video above some LLMs favored atheist, some favored the believer. In the picture below, atheist favoring LLMs are on the left, believer favoring LLMs are on the right.

The ones on the left are also lower ranking in my leaderboard and the ones on the right are higher ranking. My leaderboard:
https://sheet.zohopublic.com/sheet/published/mz41j09cc640a29ba47729fed784a263c1d08

Coincidence? My leaderboard has more domains. Does ranking high in faith mean ranking high in healthy living, nutrition, bitcoin and nostr on average?
reacted to clem's post with πŸ‘ about 1 month ago
view post
Post
2839
What are the best organizations to follow on @huggingface ?

On top of my head:
- Deepseek (35,000 followers): deepseek-ai
- Meta Llama (27,000 followers): meta-llama
- Black Forrest Labs (11,000 followers): black-forest-labs
- OpenAI (5,000 followers): openai
- Nvidia (16,000 followers): nvidia
- MIcrosoft (9,000 followers): microsoft
- AllenAI (2,000 followers): allenai
- Mistral (5,000 followers): mistralai
- XAI (600 followers): xai-org
- Stability AI (16,000 followers): stabilityai
- Qwen (16,000 followers): Qwen
- GoogleAI (8,000 followers): google
- Unsloth (3,000 followers): unsloth
- Bria AI (4,000 followers): briaai
- NousResearch (1,300 followers): NousResearch

Bonus, the agent course org with 17,000 followers: agents-course
  • 1 reply
Β·
posted an update about 1 month ago
view post
Post
1808
--- AHA Leaderboard ---

We all want AI to be properly aligned so it benefits humans with every answer it generates. While there are tremendous research around this and so many people working on it, I am choosing another route: Curation of people and then curation of datasets that are used in the LLM training. Curation of datasets comprising of people who try to uplift humanity should result in LLMs that try to help humans.

This work has revolved around two tasks:

1. Making LLMs that are benefiting humans
2. Measuring misinformation in other LLMs

The idea about the second task is, once we make and gather better LLMs and set them as "ground truth" we now can measure how much other LLMs are distancing themselves from those ground truths.
For that I am working on something I will call "AHA Leaderboard" (AHA stands for AI -- human alignment).

Link to the spreadsheet:

https://sheet.zohopublic.com/sheet/published/mz41j09cc640a29ba47729fed784a263c1d08

The columns are ground truths. The rows are the mainstream LLMs. If a mainstream LLM produces similar answers to the ground truth LLM, it gets a higher score. The LLMs that are higher in the leaderboard should be considered aligned with humans. Simple idea. This is like analyzing LLMs in different domains asking hundreds of questions and checking if they match the answers that try to mimic humans that care about other humans. Will it going to be effective? What do you think?

We want mainstream LLMs to copy answers of ground truth LLMs in certain domains. This may refocus AI towards being more beneficial. There have been 5 content providers and 6 curators as of now in the project. Join us and be one of the pioneers that fixed AI! You can be a curator, content provider or general researcher or something else.
posted an update about 2 months ago
posted an update about 2 months ago
view post
Post
3823
Some things are simple
posted an update 2 months ago
posted an update 2 months ago
view post
Post
381
Having bad LLMs is ok and can be utilized well. They can allow us to find ideas that work faster.

Reinforcement algorithm could be: "take what a proper model says and negate what a bad LLM says". Or in a mixture of agents situation we could say refute the bad LLM output and combine with the output of the good LLM.

This could mean having two wings (or more) in search of "ideas that work for most people most of the time".
  • 1 reply
Β·