emin temiz PRO

etemiz

AI & ML interests

None yet

Recent Activity

Articles

Organizations

None yet

etemiz's activity

reacted to merve's post with šŸ‘€ about 12 hours ago
posted an update about 13 hours ago
view post
Post
205
A model that does well in math, reasoning, science and other benchmarks may not do well in wisdom domain.

There are not many models that are focusing on wisdom it seems. It is going to be a problem. Smartness does not equal human alignment.
  • 1 reply
Ā·
posted an update 2 days ago
view post
Post
384
Should I create an organization tackling the AI--human alignment problem. Finding the humans that care about other humans most and basically pretraining with their stuff.. I already did some experiments and it seems to work well.

Want to know about my experiments?

Who would be interested to join?
replied to singhsidhukuldeep's post 2 days ago
view reply

As I read more about it, it looks more ground breaking.

This, combined with "Training Large Language Models to Reason in a Continuous Latent Space" paper is pretty important imo.

reacted to singhsidhukuldeep's post with šŸš€ 2 days ago
view post
Post
3510
Exciting breakthrough in AI: @Meta 's new Byte Latent Transformer (BLT) revolutionizes language models by eliminating tokenization!

The BLT architecture introduces a groundbreaking approach that processes raw bytes instead of tokens, achieving state-of-the-art performance while being more efficient and robust. Here's what makes it special:

>> Key Innovations
Dynamic Patching: BLT groups bytes into variable-sized patches based on entropy, allocating more compute power where the data is more complex. This results in up to 50% fewer FLOPs during inference compared to traditional token-based models.

Three-Component Architecture:
ā€¢ Lightweight Local Encoder that converts bytes to patch representations
ā€¢ Powerful Global Latent Transformer that processes patches
ā€¢ Local Decoder that converts patches back to bytes

>> Technical Advantages
ā€¢ Matches performance of Llama 3 at 8B parameters while being more efficient
ā€¢ Superior handling of non-English languages and rare character sequences
ā€¢ Remarkable 99.9% accuracy on spelling tasks
ā€¢ Better scaling properties than token-based models

>> Under the Hood
The system uses an entropy model to determine patch boundaries, cross-attention mechanisms for information flow, and hash n-gram embeddings for improved representation. The architecture allows simultaneous scaling of both patch and model size while maintaining fixed inference costs.

This is a game-changer for multilingual AI and could reshape how we build future language models. Excited to see how this technology evolves!
  • 2 replies
Ā·
replied to their post 3 days ago
view reply

It is not ok to remove people from the equation however efficient the machines are. We can never be sure that the synthetic matches the original in terms of alignment and those further models and further synthetics can derail the whole thing.

replied to their post 3 days ago
view reply

That's the hard part. Careful analysis for a long time and the amount of people are benefiting from them and their friends can have some clues. If the guy's solutions work most of the time for many people, over the years, he may be eligible to get into a curated LLM.

posted an update 4 days ago
view post
Post
686
What if human alignment is easy:
- Get a list of humans who really care about other humans
- Feed what they say into an LLM
Ā·
reacted to their post with šŸ§  4 days ago
view post
Post
2261
As more synthetic datasets are made, we move slowly away from human alignment.
  • 4 replies
Ā·
posted an update 5 days ago
view post
Post
2261
As more synthetic datasets are made, we move slowly away from human alignment.
  • 4 replies
Ā·
posted an update 11 days ago
view post
Post
1209
Pretraining is mostly what I do. Some ideas need to be emphasized by re training.

Better curation is possible by emphasizing certain texts.
reacted to julien-c's post with šŸ˜Ž 14 days ago
view post
Post
7570
After some heated discussion šŸ”„, we clarify our intent re. storage limits on the Hub

TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)

docs: https://huggingface.co/docs/hub/storage-limits

We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community šŸ”„

cc: @reach-vb @pierric @victor and the HF team
Ā·
posted an update 15 days ago
view post
Post
514
I've been doing this leader board for a while which tries to track what is beneficial for humans. Somewhat subjective now but it will get better thanks to more contributions and lessened biases.

There is a lot of difference in opinions among open source models. Which models are closer to the solutions that will work for different situations? How far away are the models from the reality?

https://wikifreedia.xyz/based-llm-leaderboard/npub1nlk894teh248w2heuu0x8z6jjg2hyxkwdc8cxgrjtm9lnamlskcsghjm9c
posted an update 18 days ago
view post
Post
420
Apparently you can't count on centralized AI to perform similarly, some days great some days bad. They may be distilling or doing other things to dumb it down and make it cost effective. But you can count on open source LLMs that you run locally to perform same level, every day.

So you always have to watch centralized AI but you never have to watch the local LLM.
replied to jsulz's post 27 days ago
view reply

Trying to download QwQ 32B GGUF. It disconnected like 30 times..

replied to jsulz's post 28 days ago
view reply

It may increase efficiency of HF by offloading traffic to users instead of you serving all the files.

replied to jsulz's post 28 days ago
view reply

Maybe you can also create torrents for popular files?

reacted to their post with ā¤ļø about 1 month ago
reacted to victor's post with šŸ”„ about 1 month ago
view post
Post
2250
Perfect example of why Qwen/Qwen2.5-Coder-32B-Instruct is insane?

Introducing: AI Video Composer šŸ”„
huggingface-projects/ai-video-composer

Drag and drop your assets (images/videos/audios) to create any video you want using natural language!

It works by asking the model to output a valid FFMPEG and this can be quite complex but most of the time Qwen2.5-Coder-32B gets it right (that thing is a beast). It's an update of an old project made with GPT4 and it was almost impossible to make it work with open models back then (~1.5 years ago), but not anymore, let's go open weights šŸš€.
posted an update about 1 month ago