Post
569
Releases of the past week are here
merve/releases-june-13-6852c3c1eaf1e0c24c958860
Here's our picks π€
So many interesting models released past week in open AI! π€
πΌοΈ Computer Vision/VLMs
> nanonets/Nanonets-OCR-s is the new state-of-the-art OCR model that can handle checkboxes, watermarks, tables (OS)
> Meta released facebook/v-jepa-2-6841bad8413014e185b497a6, new sota video embeddings with two new classification models (OS)
> ByteDance-Seed/SeedVR2-3B is a new 3B video restoration model (OS)
Audio
> Stepfun released stepfun-ai/Step-Audio-AQAA, new large (137B π€―) audio language model that takes in audio and generates audio (OS)
π€ Robotics
> nvidia released nvidia/GR00T-N1.5-3B, new open foundation vision language action model
3D
> tencent/Hunyuan3D-2.1 is the new version of Hunyuan by Tencent that can generate 3D assets from text and image prompts
Here's our picks π€
So many interesting models released past week in open AI! π€
πΌοΈ Computer Vision/VLMs
> nanonets/Nanonets-OCR-s is the new state-of-the-art OCR model that can handle checkboxes, watermarks, tables (OS)
> Meta released facebook/v-jepa-2-6841bad8413014e185b497a6, new sota video embeddings with two new classification models (OS)
> ByteDance-Seed/SeedVR2-3B is a new 3B video restoration model (OS)
Audio
> Stepfun released stepfun-ai/Step-Audio-AQAA, new large (137B π€―) audio language model that takes in audio and generates audio (OS)
π€ Robotics
> nvidia released nvidia/GR00T-N1.5-3B, new open foundation vision language action model
3D
> tencent/Hunyuan3D-2.1 is the new version of Hunyuan by Tencent that can generate 3D assets from text and image prompts