
Ross Wightman
AI & ML interests
Recent Activity
Organizations
rwightman's activity
Request: DOI
Non-English language pdfs
Where are the answers to the questions in the dataset?


timm/ViT-B-16-SigLIP2-256

timm/ViT-SO400M-16-SigLIP2-512

timm/ViT-L-16-SigLIP2-512
Model size seems odd

I've got my hands on an AMD Instinct MI100. It's about the same price used as a V100 but on paper has more TOPS (V100 14TOPS vs MI100 23TOPS) also the HBM has faster clock so the memory bandwidth is 1.2TB/s.
For quantized inference it's a beast (MI50 was also surprisingly fast)
For LORA training with this quick test I could not make the bnb config works so I'm running the FT on the fill size model.
Will share all the install, setup and setting I've learned in a blog post, together with the cooling shroud 3D design.

Yeah it's 112 for PCIe V100 and 125 for the SXM I think. One thing on the MI100 and other MIxx chip specs I was never clear on, if their float16 'matrix' numbers are matrix mul float16 w/ float32 accumulate (which is what you'd want). The datacenter NVIDIA chip 'tensor core' flops are usually float32 acc (unless it's a gamer card in which case that's halved).
The MI100 does have native bfloat16 which is a big win over V100.
I do feel though you are getting good TOPS/$ here because AMD hasn't been that successful in competing with NVIDIA on the full system offer (chips + driver/software). I've really really wanted this to change but AMD keeps frustrating... how do you find working with it so far in terms of issues / crashes / head banging? :) Hopefully things have been improving

FWIW, the MI100 was released after the A100, 3 years after the V100... that says something :) Also it's the matrix / tensor core mixed or reduced precision FLOPs that are of interest not the float32 FLOPS which are the 14 & 23 numbers..


timm/ViT-gopt-16-SigLIP2-384

timm/ViT-gopt-16-SigLIP2-256

timm/ViT-SO400M-16-SigLIP2-512

timm/ViT-SO400M-16-SigLIP2-384

timm/ViT-SO400M-16-SigLIP2-256

timm/ViT-SO400M-14-SigLIP2-378
