Knut Jägersberg's picture

Knut Jägersberg

KnutJaegersberg

AI & ML interests

NLP, opinion mining, narrative intelligence

Recent Activity

liked a model 2 days ago
NX-AI/xLSTM-7b
liked a model 2 days ago
ibm-fms/Bamba-9B
liked a model 2 days ago
OpenGVLab/InternVL2_5-38B-MPO
View all activity

Articles

Organizations

LLMs's profile picture Blog-explorers's profile picture Qwen's profile picture Social Post Explorers's profile picture M4-ai's profile picture Chinese LLMs on Hugging Face's profile picture

KnutJaegersberg's activity

posted an update 6 days ago
reacted to sayakpaul's post with 🤗 15 days ago
view post
Post
2040
Introducing a high-quality open-preference dataset to further this line of research for image generation.

Despite being such an inseparable component for modern image generation, open preference datasets are a rarity!

So, we decided to work on one with the community!

Check it out here:
https://huggingface.co/blog/image-preferences
·
reacted to ariG23498's post with 🤗 20 days ago
posted an update 20 days ago
posted an update 25 days ago
posted an update 29 days ago
posted an update 4 months ago
view post
Post
1175
appvoid/arco

arco consistently outperforms every sota model below 600m parameters on average

appvoid/arco
posted an update 4 months ago
posted an update 4 months ago
posted an update 5 months ago
posted an update 5 months ago
reacted to merve's post with 👍 6 months ago
posted an update 6 months ago
view post
Post
640
Unsocial Intelligence: an Investigation of the Assumptions of AGI Discourse

I don't agree with some of the assertions made here, but it is an interesting paper and a good overview.

https://arxiv.org/abs/2401.13142
reacted to merve's post with ❤️ 6 months ago
view post
Post
4339
Florence-2 is a new vision foundation model capable of a wide variety of tasks 🤯
Demo 👉🏻 gokaygokay/Florence-2
Collection 👉🏻 microsoft/florence-6669f44df0d87d9c3bfb76de

This model can handle tasks that vary from OCR to semantic segmentation.

The difference from previous models is that the authors have compiled a dataset consisting of 126M images with 5.4B annotations labelled with their own data engine pseudolabelled by smaller specialized models and APIs.

The model has a similar architecture to previous models: an image encoder and a multimodality encoder with a text decoder. The authors have compiled the multitask dataset with prompts for each task.

You can also fine-tune this model on any task of choice. The authors also released different results on downstream tasks and reported their results when un/freezing the vision encoder 🤓📉
They have released fine-tuned models too, you can find them in the collection above 🤗
·
reacted to merve's post with 🔥 6 months ago
view post
Post
3013
Finally @CVPR2024 is here! 🩷
Have you claimed your papers and linked your models/datasets/demos?
This will increase visibility and impact of your paper 💫

To index your papers, go here
CVPR2024/CVPR2024-papers
Find your paper, click on paper page link, index the paper, then click on your name (workflow is below 👇🏻)
If you'd like to add links to your paper, go here CVPR2024/update-CVPR2024-papers
login, find your paper's id, retrieve the paper, fill in the info and submit!
replied to s3nh's post 7 months ago
view reply

Don't burn out! Lighten up again will you.

posted an update 7 months ago
reacted to s3nh's post with ❤️ 7 months ago
view post
Post
GPU Poor POV: Burnout

Sometimes we do not have an energy to post about AI and new methods.
And thats totally ok, I guess.
Remember to sleep well and drink a lot of water. Have a great day :D <3
  • 2 replies
·
replied to BramVanroy's post 9 months ago
view reply

it mixed up stuff in the output, gave weird answers. didn't have that problem with other models. maybe the update they released sovled that issue, I just never cared, given the alternatives.

reacted to BramVanroy's post with 👍 9 months ago
view post
Post
2392
Does anyone have experience with finetuning Gemma? Even the 2B variant feels more memory heavy than mistral 7B. I know that its vocabulary is much larger (250k) but I'm a bit surprised that the max batch size that I can get in an A100 80GB is only 2 whereas I could fit 4 with mistral 7B - even though Gemma is much smaller except for the embedding layer. Both runs were using FA, same sequence length, same deepspeed zero 3 settings. Oh and yes I'm using the most recent hot fix of transformers that solves a memory issue with Gemma and others.

Any prior experience that you can share or suggestions to improve throughout?
  • 4 replies
·