Ross Wightman
AI & ML interests
Recent Activity
Articles
Organizations
rwightman's activity
π Deploying OLMo-7B with Text Generation Inference (TGI) on Hugging Face Spaces
Upload 148177b2eb5a5cbb2b32fc7923b889da.jpg
Open-R1: a fully open reproduction of DeepSeek-R1
Multimodal π¬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG π
- UI-TARS are new models by ByteDance to unlock agentic GUI control π€― in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark
LLMs π
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! π€―
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)
Audio π£οΈ
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO
Image/Video/3D Generation β―οΈ
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images
Allow running conversion after closing a previous PR.
FWIW, I just found a bug in the pipeline, it's applying sigmoid instead of softmax by default due to a bug (https://github.com/huggingface/transformers/pull/35848), so add function_to_apply='softmax'
if you want softmax probs... this is not specific to the timm integration and looks like it's been there a while. I did confirm that if you set the labels as above the fine-tuned timm model will predict with the correct labels and should push to hub with those as well...
@davidrs this part in the example to reset classifier / labels with the target dataset is important
model = AutoModelForImageClassification.from_pretrained(
checkpoint,
num_labels=num_labels,
id2label=id2label,
label2id=label2id,
ignore_mismatched_sizes=True,
)
If you don't do that I believe it gets pushed with the original labels, an imagenet-1k (1000) class classifier, etc. Though I guess the lowest 'n' classes that are in the new dataset would be fine-tuned with the target (if it's less, it'd crash if the new dataset has more classes)
@davidrs
hmm, possible there's an issue with the label handling, there was a change made to the integration near release to keep the labels compatible with timm
use (keep label_names
field instead of id2label/label2id), there's actually a mix of the two cases and many of
@ariG23498
fine-tunes have the id2label, though I was told at release time it should be producing label_names...
Do you have a public model I could look at? You're using the example image classification script in Transformers (https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) or using Trainer directly yourself in a custom script/notebook as in the example above?