Ross Wightman

rwightman

AI & ML interests

Computer vision, transfer learning, semi/self supervised learning, robotics.

Recent Activity

Articles

Organizations

Hugging Face's profile picture PyTorch Image Models's profile picture Spaces-explorers's profile picture Flax Community's profile picture LAION eV's profile picture Pixel Parsing's profile picture

rwightman's activity

upvoted an article about 8 hours ago
view article
Article

πŸš€ Deploying OLMo-7B with Text Generation Inference (TGI) on Hugging Face Spaces

By ariG23498 β€’
β€’ 5
New activity in timm/ViT-L-16-SigLIP-384 about 9 hours ago
upvoted an article 8 days ago
view article
Article

Open-R1: a fully open reproduction of DeepSeek-R1

β€’ 621
reacted to merve's post with πŸ”₯ 11 days ago
view post
Post
4836
Oof, what a week! πŸ₯΅ So many things have happened, let's recap! merve/jan-24-releases-6793d610774073328eac67a9

Multimodal πŸ’¬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG πŸ’—
- UI-TARS are new models by ByteDance to unlock agentic GUI control 🀯 in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark

LLMs πŸ“–
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! 🀯
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)

Audio πŸ—£οΈ
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO

Image/Video/3D Generation ⏯️
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images
Β·
New activity in safetensors/convert 12 days ago
view reply

FWIW, I just found a bug in the pipeline, it's applying sigmoid instead of softmax by default due to a bug (https://github.com/huggingface/transformers/pull/35848), so add function_to_apply='softmax' if you want softmax probs... this is not specific to the timm integration and looks like it's been there a while. I did confirm that if you set the labels as above the fine-tuned timm model will predict with the correct labels and should push to hub with those as well...

view reply

@davidrs this part in the example to reset classifier / labels with the target dataset is important

model = AutoModelForImageClassification.from_pretrained(
    checkpoint,
    num_labels=num_labels,
    id2label=id2label,
    label2id=label2id,
    ignore_mismatched_sizes=True,
)

If you don't do that I believe it gets pushed with the original labels, an imagenet-1k (1000) class classifier, etc. Though I guess the lowest 'n' classes that are in the new dataset would be fine-tuned with the target (if it's less, it'd crash if the new dataset has more classes)

view reply

@davidrs hmm, possible there's an issue with the label handling, there was a change made to the integration near release to keep the labels compatible with timm use (keep label_names field instead of id2label/label2id), there's actually a mix of the two cases and many of @ariG23498 fine-tunes have the id2label, though I was told at release time it should be producing label_names...

Do you have a public model I could look at? You're using the example image classification script in Transformers (https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) or using Trainer directly yourself in a custom script/notebook as in the example above?

New activity in safetensors/convert 14 days ago

Update convert.py

#37 opened 14 days ago by
rwightman